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Cn . Abstract. The World Wide Web currently evolves into a Web of Linked Data 

$_( ' where content providers publish and link data as they have done with hypertext 

C1h| for the last 20 years. While the declarative query language SPARQL is the de 

■^r ■ facto for querying a-priory defined sets of data from the Web, no language exists 

for querying the Web of Linked Data itself. However, it seems natural to ask 
whether SPARQL is also suitable for such a purpose. 

In this paper we formally investigate the applicability of SPARQL as a query lan- 
QO ' guage for Linked Data on the Web. In particular, we study two query models: 1) a 

/^ , full- Web semantics where the scope of a query is the complete set of Linked Data 

^"I ' on the Web and 2) a family of reachability-based semantics which restrict the 

Y^ I scope to data that is reachable by traversing certain data links. For both models 

we discuss properties such as monotonicity and computability as well as the im- 
plications of querying a Web that is infinitely large due to data generating servers. 
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> ■ 1 Introduction 

a^ 

j^ ■ The emergence of vast amounts of RDF data on the WWW has spawned research on 

storing and querying large collections of such data efficiently. The prevalent query lan- 
^^ guage in this context is SPARQL ||T6l which defines queries as functions over an RDF 

^— N I dataset, that is, a fixed, a-priory defined collection of sets of RDF triples. This definition 

p^ ■ naturally fits the use case of querying a repository of RDF data copied from the Web. 

However, most RDF data on the Web is published following the Linked Data prin- 
ciples Q, contributing to the emerging Web of Linked Data f6t. This practice allows 
for query approaches that access the most recent version of remote data on demand. 
More importantly, query execution systems may automatically discover new data by 
^ I traversing data links. As a result, such a system answers queries based on data that is 

" " not only up-to-date but may also include initially unknown data. These features are the 

foundation for true serendipity, which we regard as the most distinguishing advantage 
of querying the Web itself, instead of a predefined, bounded collection of data. 

While several research groups work on systems that evaluate SRARQL basic graph 
patterns over the Web of Linked Data (cf. ||9l, I110]121 and II13I14I ). we notice a shortage 
of work on theoretical foundations and properties of such queries. Furthermore, there is 
a need to support queries that are more expressive than conjunctive (basic graph pattern 
based) queries iflTl . However, it seems natural to assume that SPARQL could be used 
in this context because the Web of Linked Data is based on the RDF data model and 
SPARQL is a query language for RDF data. In this paper we challenge this assumption. 



This report presents an extended version of a paper published in ESWC 2012 Bill . The 
extended version contains proofs for all technical results in the paper (cf. Appendix Ict. 



Contributions In this paper we understand queries as functions over the Web of Linked 
Data as a whole. To analyze the suitability of SPARQL as a language for such queries, 
we have to adjust the semantics of SPARQL. More precisely, we have to redefine the 
scope for evaluating SPARQL algebra expressions. In this paper we discuss two ap- 
proaches for such an adjustment. The first approach uses a semantics where the scope 
of a query is the complete set of Linked Data on the Web. We call this semantics full- 
Web semantics. The second approach introduces a family of reachability-based seman- 
tics which restrict the scope to data that is reachable by traversing certain data links. 
We emphasize that both approaches allow for query results that are based on data from 
initially unknown sources and, thus, enable applications to tap the full potential of the 
Web. Nevertheless, both approaches precisely define the (expected) result for any query. 
As a prerequisite for defining the aforementioned semantics and for studying the- 
oretical properties of queries under these semantics, we introduce a theoretical frame- 
work. The basis of this framework is a data model that captures the idea of a Web of 
Linked Data. We model such a Web as an infinite structure of documents that contain 
RDF data and that are interlinked via this data. Our model allows for infiniteness be- 
cause the number of entities described in a Web of Linked Data may be infinite; so may 
the number of documents. The following example illustrates such a case: 

Example 1. Let Ui denote an HTTP scheme based URI that identifies the natural num- 
ber i. There is a countably infinite number of such URIs. The WWW server which 
is responsible for these URIs may be set up to provide a document for each natural 
number These documents may be generated upon request and may contain RDF data 
including the RDF triple (wi, http://.../next, Ui+i). This triple associates the natural number 
i with its successor i+\ and, thus, links to the data about i+\ |fT9l . An example for such 
a server is provided by the Linked Open Numbers projeclQ 

In addition to the data model our theoretical framework comprises a computation model. 
This model is based on a particular type of Turing machine which formally captures the 
limited data access capabilities of computations over the Web. 
We summarize the main contributions of this paper as follows: 

- We present a data model and a computation model that provide a theoretical frame- 
work to define and to study query languages for the Web of Linked Data. 

- We introduce a full- Web semantics and a family of reachability-based semantics 
for a (hypothetical) use of SPARQL as a language for queries over Linked Data. 

- We systematically analyze SPARQL queries under the semantics that we introduce. 
This analysis includes a discussion of satisfiability, monotonicity, and computabil- 
ity of queries under the different semantics, a comparison of the semantics, and a 
study of the implications of querying a Web of Linked Data that is infinite. 

Related Work Since its emergence the WWW has attracted research on declarative 
query languages for the Web. For an overview on early work in this area we refer to JS]. 
Most of this work understands the WWW as a hypertext Web. Nonetheless, some of the 
foundational work can be adopted for research on Linked Data. The computation model 
that we use in this paper is an adaptation of the ideas presented in lH] and ifTSl . 



' http://km.aifb.kit.edu/prqjects/numbers/ 



In addition to the early work on Web queries, query execution over Linked Data 
on the WWW has attracted much attention recently II9]10]12I13I14I . However, existing 
work primarily focuses on various aspects of (query-local) data management, query ex- 
ecution, and optimization. The only work we are aware of that aims to formally capture 
the concept of Linked Data and to provide a well-defined semantics for queries in this 
context is Bouquet et al.'s |f7l| . They define three types of query methods for conjunctive 
queries: a bounded method which only uses RDF data referred to in queries, a direct 
access method which assumes an oracle that provides all RDF graphs which are "rel- 
evant" for a given query, and a navigational method which corresponds to a particular 
reachability-based semantics. For the latter Bouquet et al. define a notion of reachabil- 
ity that allows a query execution system to follow all data links. As a consequence, the 
semantics of queries using this navigational method is equivalent to, what we call, caii- 
semantics (cf. Section ISTt ; it is the most general of our reachability -based semantics. 
Bouquet et al.'s navigational query model does not support other, more restrictive no- 
tions of reachability, as is possible with our model. Furthermore, Bouquet et al. do not 
discuss full SPARQL, theoretical properties of queries, or the infiniteness of the WWW. 

While we focus on the query language SPARQL in the context of Linked Data on the 
Web, the theoretical properties of SPARQL as a query language for a fixed, predefined 
collection of RDF data are well understood today [(2,3 16. 181 . Particularly interesting 
in our context are semantical equivalences between SPARQL expressions ifTSl because 
these equivalences may also be used for optimizing SPARQL queries over Linked Data. 

Structure of the paper The remainder of this paper is organized as follows. Sec- 
tion|2]introduces the preliminaries for our work. In Section [3] we present the data model 
and the computation model. Sections |4] and |5] discuss the full-Web semantics and the 
reachability-based semantics for SPARQL, respectively. We conclude the paper in Sec- 
tion |6l For full technical proofs of all results in this paper we refer to Appendix ICl 



2 Preliminaries 

This section provides a brief introduction of RDF and the query language SPARQL. 

We assume pairwise disjoint, countably infinite sets U (all HTTP scheme based 
URIo- S (blank nodes), C (literals), and V (variables, denoted by a leading '?' sym- 
bol). An RDF triple t is a tuple (s,p, o) G (Z^U;B)xZ^x(ZYU;BU/:). For any RDF triple 
t = {s,p,o) we define tcrms(t) ~ {s,p,o} and uris(i) = tcrms(t) DU. Overloading 
function terms, we write tcrms(G) = UteG tcrms(t) for any (potentially infinite) set 
G of RDF triples. In contrast to the usual formalization of RDF we allow for infinite 
sets of RDF triples which we require to study infinite Webs of Linked Data. 

In this paper we focus on the core fragment of SPARQL discussed by Perez et 
al. lfT6l and we adopt their formalization approach, that is, we use the algebraic syntax 
and the compositional set semantics introduced in |fT6l . SPARQL expressions are de- 
fined recursively: i) A triple pattern {s,p, o) 6 (V U W) x (V U ^) x (V U W U £) is 



^ For the sake of simplicity we assume in this paper that URIs are HTTP scheme based URIs. 
However, our models and resuh may be extended easily for all possible types of URIs. 



a SPARQL expressioifl ii) If Pi and P2 are SPARQL expressions, then (Pi ANDP2), 
(Pi UNION P2), (Pi OPTP2), and (Pi FILTER P) are SPARQL expressions where P is a 
filter condition. For a formal definition of filter conditions we refer to |fT6] . To denote the 
set of all variables in all triple patterns of a SPARQL expression P we write vars(P). 

To define the semantics of SPARQL we introduce valuations, that are, partial map- 
pings /i : V ~¥ UUBUC. The evaluation of a SPARQL expression P over a potentially 
infinite set G of RDF triples, denoted by |P]gi is a set of valuations. In contrast to the 
usual case, this set may be infinite in our scenario. The evaluation function |]. is de- 
fined recursively over the structure of SPARQL expressions. Due to space limitations, 
we do not reproduce the full formal definition of |]. here. Instead, we refer the reader 
to the definitions given by Perez et al. jTSll ; even if Perez et al. define |-| . for finite sets 
of RDF triples, it is trivial to extend their formalism for infiniteness (cf. Appendix iBli. 

A SPARQL expression P is monotonic if for any pair Gi , G2 of (potentially infinite) 
sets of RDF triples such that Gi C G2, it holds that iPjci ^ Mgs- A SPARQL ex- 
pression P is satisfiable if there exists a (potentially infinite) set G of RDF triples such 
that |P|g 7^ 0- It is trivial to show that any non-satisfiable expression is monotonic. 

In addition to the traditional notion of satisfiability we shall need a more restrictive 
notion for the discussion in this paper: A SPARQL expression P is nontrivially satisfi- 
able if there exists a (potentially infinite) set G of RDF triples and a valuation /i such 
that i) /i G I-P]g and ii) /i provides a binding for at least one variable; i.e. dom(/i) j^ 0. 

Example 2. Let P^^ = tp be a SPARQL expression that consists of a single triple 
pattern tp ~ (ui, U2, U3) where ui, M2, ^3 e U; hence, tp actually is an RDF triple. For 
any set G of RDF triples for which (ui , U2 , ^3 ) G G it is easy to see that the evaluation 
of ^EJ2lo'^sr G contains a single, empty valuation p^, that is, |PeJ2|g == {M0} where 
dom(/i0) =: 0. In contrast, for any other set G of RDF triples it holds IPeJ^Ig = ^■ 
Hence, PEJ^li^ "^^ nontrivially satisfiable (although it is satisfiable). 

3 Modeling a Web of Linked Data 

In this section we introduce theoretical foundations which shall allow us to define and 
to analyze query models for Linked Data. In particular, we propose a data model and 
introduce a computation model. For these models we assume a static view of the Web; 
that is, no changes are made to the data on the Web during the execution of a query. 

3.1 Data Model 

We model the Web of Linked Data as a potentially infinite structure of interlinked doc- 
uments. Such documents, which we call Linked Data documents, or LD documents for 
short, are accessed via URIs and contain data that is represented as a set of RDF triples. 

Definition 1. Let T = {U U B) x U x {U U B U C) be the infinite set of all possible 
RDF triples. A Web of Linked Data is a tuple W ~ {D, data, adoc) where: 

^ For the sake of a more straightforward formalization we do not permit blank nodes in triple 
patterns. In practice, each blank node in a SPARQL query can be replaced by a new variable. 



- D is a (finite or countably infinite) set of symbols that represent LD documents. 

- data is a total mapping data : D — > 2^ such that V d ^ D : data(d) is finite and 
Vdi,d2 E D : di ^ d2 ^ terms (data (di)) Ci B ^ terms (dato (1^2 )) H B. 

- adoc is a partial, surjective mapping adoc : U ~¥ D. 

While the three elements D, data, and adoc completely define a Web of Linked Data 
in our model, we point out that these elements are abstract concepts and, thus, are not 
available to a query execution system. However, by retrieving LD documents, such a 
system may gradually obtain information about the Web. Based on this information the 
system may (partially) materialize these three elements. In the following we discuss the 
three elements and introduce additional concepts that we need to define queries. 

We say a Web of Linked Data W = {D, data, adoc) is finite if and only if D is 
finite; otherwise, W is infinite. Our model allows for infiniteness to cover cases where 
Linked Data about an infinite number of identifiable entities is generated on the fly. The 
Linked Open Numbers project (cf. Example[Tll illustrates that such cases are possible in 
practice. Another example is the LinkedGeoData projecO which provides Linked Data 
about any circular and rectangular area on Earth [2]- Covering these cases enables us to 
model queries over such data and analyze the effects of executing such queries. 

Evenif a Web of Linked Data W ~ {D, data, adoc) is infinite, Definition[T]requires 
countability for D. We emphasize that this requirement does not restrict us in modeling 
the WWW as a Web of Linked Data: In the WWW we use URIs to locate documents 
that contain Linked Data. Even if URIs are not limited in length, they are words over a 
finite alphabet. Thus, the infinite set of all possible URIs is countable, as is the set of all 
documents that may be retrieved using URIs. 

The mapping data associates each LD document d £ D in a Web of Linked Data 
W ~ {D, data, adoc) with a finite set of RDF triples. In practice, these triples are ob- 
tained by parsing d after d has been retrieved from the Web. The actual retrieval mech- 
anism is not relevant for our model. However, as prescribed by the RDF data model, 
Definition[T|requires that the data of each d E D uses a unique set of blank nodes. 

To denote the (potentially infinite but countable) set of all RDF triples in W we 
write AllData(M/'); i.e. it holds: AllData(iy) = {data{d) \de D}. 

Since we use URIs as identifiers for entities, we say that an LD document d E D 
describes the entity identified by URI u gU if there exists {s,p, o) € data{d) such that 
s = u OT o — u. Notice, there might be multiple LD documents that describe an entity 
identified by u. However, according to the Linked Data principles, each u eU may also 
serve as a reference to a specific LD document which is considered as an authoritative 
source of data about the entity identified by u. We model the relationship between 
URIs and authoritative LD documents by mapping adoc. Since some LD documents 
may be authoritative for multiple entities, we do not require injectivity for adoc. The 
"real world" mechanism for dereferencing URIs (i.e. learning about the location of the 
authoritative LD document) is not relevant for our model. For each u ElA that cannot 
be dereferenced (i.e. "broken links") or that is not used in W it holds u ^ dom(a(ioc). 

A URI ueU with u G do\n{adoc) that is used in the data of an LD document diED 
constitutes a data link to the LD document ^2 = adoc{u) G D. These data links form a 

'^ http://linkedgeodata.org 



graph structure which we call link graph. The vertices in such a graph represent the LD 
documents of the corresponding Web of Linked Data; edges represent data links. 

To study the monotonicity of queries over a Web of Linked Data we require a con- 
cept of containment for such Webs. For this purpose, we introduce the notion of an 
induced subweb which resembles the concept of induced subgraphs in graph theory. 

Definition 2. Let W ~ {D, data, adoc) and W' = (£>', data', adoc') be Webs of Linked 
Data. W' is an induced subweb ofW ifi) D' C D, ii) \/ d & D' : data'{d) = data{d), 
and Hi) Vm £ IAd' '■ adoc' [u) = adoc{u) where Ud' = {u £U\ adoc{u) € D'}. 

It can be easily seen from Definition|2]that specifying I?'is sufficient to unambiguously 
define an induced subweb {D', data', adoc') of a given Web of Linked Data. Further- 
more, it is easy to verify that for an induced subweb W' of a Web of Linked Data W it 
holds AllData(M^') C AllData(T^). 

In addition to the structural part, our data model introduces a general understanding 
of queries over a Web of Linked Data: 

Definitions. Let W be the infinite set of all possible Webs of Linked Data (i.e. all 
3-tuples that correspond to Definition [7]) and let fl be the infinite set of all possible 
valuations. A Linked Data query q is a total function q: W —> 2^^. 

The notions of satisfiability and monotonicity carry over naturally to Linked Data queries: 
A Linked Data query q is satisfiable if there exists a Web of Linked Data W such that 
q{W) is not empty. A Linked Data query q is nontrivially satisfiable if there exists a 
Web of Linked Data W and a valuation /i such that i) /i G q{W) and ii) dom(/i) ^ 0. 
A Linked Data query q is monotonic if for every pair Wi, Wi of Webs of Linked Data 
it holds: If W\ is an induced subweb of W2, then q{W\) C qiyV-i). 

21.1 Computation Model 

Usually, functions are computed over structures that are assumed to be fully (and di- 
rectly) accessible. In contrast, we focus on Webs of Linked Data in which accessibility 
is limited: To discover LD documents and access their data we have to dereference 
URIs, but the full set of those URIs for which we may retrieve documents is unknown. 
Hence, to properly analyze a query model for Webs of Linked Data we must define a 
model for computing functions on such a Web. This section introduces such a model. 

In the context of queries over a hypertext-centric view of the WWW, Abiteboul and 
Vianu introduce a specific Turing machine called Web machine HI. Mendelzon and 
Milo propose a similar machine model [151. These machines formally capture the lim- 
ited data access capabilities on the WWW and thus present an adequate abstraction for 
computations over a structure such as the WWW. Based on these machines the authors 
introduce particular notions of computability for queries over the WWW. These notions 
are: (finitely) computable queries, which correspond to the traditional notion of com- 
putability; and eventually computable queries whose computation may not terminate 
but each element of the query result will eventually be reported during the computation. 
We adopt the ideas of Abiteboul and Vianu and of Mendelzon and Milo for our work. 
More precisely, we adapt the idea of a Web machine to our scenario of a Web of Linked 



Data. We call our machine a Linked Data machine (or LD machine, for short). Based on 
this machine we shall define finite and eventual computability for Linked Data queries. 
Encoding (fragments of) a Web of Linked Data W = {D, data, adoc) on the tapes 
of such an LD machine is straightforward because all relevant structures, such as the 
sets D or U, are countably infinite. In the remainder of this paper we write cnc(x) to 
denote the encoding of some element x (e.g. a single RDF triple, a set of triples, a full 
Web of Linked Data, a valuation, etc.). For a detailed definition of the encodings we use 
in this paper, we refer to Appendix lAl We now define LD machine: 

Definition 4. An LD machine is a multi-tape Turing machine with five tapes and a 
finite set of states, including a special state called expand. The five tapes include two, 
read-only input tapes: i) an ordinary input tape and ii) a right-infinite Web tape which 
can only be accessed in the expand state; two work tapes: Hi) an ordinary, two-way 
infinite work tape and iv) a right-infinite link traversal tape; and v) a right-infinite, 
append-only output tape. Initially, the work tapes and the output tape are empty, the 
Web tape contains a (potentially infinite) word that encodes a Web of Linked Data, and 
the ordinary input tape contains an encoding of further input (if any). Any LD machine 
operates like an ordinary multi-tape Turing machine except when it reaches the expand 
state. In this case LD machines perform the following expand procedure.- The machine 
inspects the word currently stored on the link traversal tape. If the suffix of this word 
is the encoding cnc(u) of some URI u £ 14 and the word on the Web tape contains 
jjenc(u) cnc{adoc(u)) jj, then the machine appends ciic(adoc{u)) jj to the (right) end 
of the word on the link traversal tape by copying from the Web tape; otherwise, the 
machine appends ft to the word on the link traversal tape. 

Notice how any LD machine M is limited in the way it may access a Web of Linked 
Data W~{D, data, adoc) that is encoded on its Web tape: Af may use the data of any 
particular dE D only after it performed the expand procedure using a URI uElA for 
which adoc{u) = d. Hence, the expand procedure simulates a URI based lookup which 
conforms to the (typical) data access method on the WWW. We now use LD machines 
to adapt the notion of finite and eventual computabihty Q] for Linked Data queries: 

Definition 5. A Linked Data query q is finitely computable if there exists an LD ma- 
chine which, for any Web of Linked Data W encoded on the Web tape, halts after a 
finite number of steps and produces a possible encoding of q{W) on its output tape. 

Definition 6. A Linked Data q query is eventually computable if there exists an LD 
machine whose computation on any Web of Linked Data W encoded on the Web tape 
has the following two properties: 1.) the word on the output tape at each step of the 
computation is a prefix of a possible encoding ofq{W) and 2.) the encoding enc(^') 
of any /i' g q(W) becomes part of the word on the output tape after a finite number of 
computation steps. 

Any machine for a non-satisfiable query may immediately report the empty result. Thus: 

Fact 1. Non-satisfiable Linked Data queries are finitely computable. 

In our analysis of SPARQL-based Linked Data queries we shall discuss decision prob- 
lems that have a Web of Linked Data W as input. For such problems we assume the 
computation may only be performed by an LD machine with cnc(VK) on its Web tape: 



Definition 7. Let W' be a (potentially infinite) set of Webs of Linked Data (each of 
which may be infinite itself); let X be an arbitrary (potentially infinite) set of finite 
structures; and let DP C W" x X. The decision problem for DP, that is, decide for any 
(yV, X) G WxX whether [W, X) e DP, is LD machine decidable if there exist an LD 
machine whose computation on any W £ W encoded on the Web tape and any X eX 
encoded on the ordinary input tape, has the following property: The machine halts in 
an accepting state if{W, X) G DP; otherwise the machine halts in a rejecting state. 

Obviously, any (Turing) decidable problem that does not have a Web of Linked Data 
as input, is also LD machine decidable because LD machines are Turing machines; for 
these problems the corresponding set W' is empty . 

4 Full-Web Semantics 

Based on the concepts introduced in the previous section we now define and study 
approaches that adapt SPARQL as a language for expressing Linked Data queries. 

The first approach that we discuss is full-Web semantics where the scope of each 
query is the complete set of Linked Data on the Web. Hereafter, we refer to SPARQL 
queries under this full- Web semantics as SPARQLld queries. The definition of these 
queries is straightforward and makes use of SPARQL expressions and their semantics: 

Definition 8. Let P be a SPARQL expression. The SPARQLld query that uses P, de- 
noted by Q^, is a Linked Data query that, for any Web of Linked Data W, is defined as: 
Q^(W) = |i^]AiiData(w)- Each valuation /i G Q^{W) is a solution for Q^ in W. 

In the following we study satisfiability, monotonicity, and computability of SPARQLld 
queries and we discuss implications of querying Webs of Linked Data that are infinite. 

4.1 Satisfiability, Nontrivial Satisfiability, Monotonicity, and Computability 

For satisfiability and monotonicity we may show the following dependencies. 

Proposition 1. Let Q^ be a SPARQLio query that uses SPARQL expression P. 

1. Q^ is satisfiable if and only if P is satisfiable. 

2. Q^ is nontrivially satisfiable if and only if P is nontrivially satisfiable. 

3. Q^ is monotonic if and only if P is monotonic. 

We now discuss computability. Since all non-satisfiable SPARQLld queries are finitely 
computable (recall FactlTJ, we focus on satisfiable SPARQLld queries. Our first main 
result shows that the computability of such queries depends on their monotonicity: 

Theorem l.Ifa satisfiable SPARQLio query is monotonic, then it is eventually com- 
putable (but not finitely computable); otherwise, it is not even eventually computable. 

In addition to a direct dependency between monotonicity and computability, Theorem[T] 
shows that not any satisfiable SPARQLld query is finitely computable; instead, such 
queries are at best eventually computable. The reason for this limitation is the infinite- 
ness of U: To (fully) compute a satisfiable SPARQLld query, an LD machine requires 



access to the data of all LD documents in the queried Web of Linked Data. Recall that, 
initially, the machine has no information about what URI to use for performing an ex- 
pand procedure with which it may access any particular document. Hence, to ensure 
that all documents have been accessed, the machine must expand all u ^ U. This pro- 
cess never terminates because U is infinite. Notice, a real query system for the WWW 
would have a similar problem: To guarantee that such a system sees all documents, it 
must enumerate and lookup all (HTTP scheme) URIs. 

The computabihty of any Linked Data query is a general, input independent prop- 
erty which covers the worst case (recall, the requirements given in Definitions |5] and |6] 
must hold for any Web of Linked Data). As a consequence, in certain cases the compu- 
tation of some (eventually computable) SPARQLld queries may still terminate: 

Example 3. Let Q e|2] be a monotonic SPARQLld query which uses the SPARQL ex- 
pression P^^ = (ui, U2, U3) that we introduce in Example|2] Recall, P^JJlis satisfiable 
but not nontrivially satisfiable. The same holds for Q i^M (cf. Proposition [T]i- An LD 
machine for Q Ej2]may take advantage of this fact: As soon as the machine discovers an 
LD document which contains RDF triple (ui, 112, U3), the machine may halt (after re- 
porting {/i0} with dom(/^t0) = as the complete query result). In this particular case 
the machine would satisfy the requirements for finite computabihty. However, Q b[2] is 
stiU only eventually computable because there exist Webs of Linked Data that do not 
contain any LD document with RDF triple {ui,U2,u-s); any (complete) LD machine 
based computation of Q e{2] over such a Web cannot halt (cf. proof of Theorem [T}. 

The example illustrates that the computation of an eventually computable query over a 
particular Web of Linked Data may terminate. This observation leads us to a decision 
problem which we denote as Termination(SPARQLld)- This problem takes a Web 
of Linked Data W and a satisfiable SPARQLld query Q^ as input and asks whether 
an LD machine exists that computes Q^{W) and halts. For discussing this problem we 
note that the query in Example |3]represents a special case, that is, SPARQLld queries 
which are satisfiable but not nontrivially satisfiable. The reason why an LD machine 
for such a query may halt, is the implicit knowledge that the query result is complete 
once the machine identified the empty valuation fj.0 as a solution. Such a completeness 
criterion does not exist for any nontrivially satisfiable SPARQLld query: 

Lemma 1. There is not any nontrivially satisfiable SPARQLto query Q^ for which 
exists an LD machine that, for any Web of Linked Data W encoded on the Web tape, 
halts after a finite number of computation steps and outputs an encoding of Q^(W). 

Lemma [1] shows that the answer to Termination(SPARQLld) is negative in most 
cases. However, the problem in general is undecidable (for LD machines) since the in- 
put for the problem includes queries that correspond to the aforementioned special case. 

Theorem 2. Termination(SPARQLld) « not LD machine decidable. 

4.2 Querying an Infinite Web of Linked Data 

The limited computabihty of SPARQLld queries that our results in the previous section 
show, is a consequence of the infiniteness of U and not of a possible infiniteness of the 



queried Web. We now focus on the implications of potentially infinite Webs of Linked 
Data for SPARQLld queries. However, we assume a finite Web first: 

Proposition 2. SPARQLlq queries over a finite Web of Linked Data have a finite result. 

The following example illustrates that a similarly general statement does not exist when 
the queried Web is infinite such as the WWW. 

Example 4. Let W\n^ = (-Dmf , data\r,u adocmf) be an infinite Web of Linked Data that 
contains LD documents for all natural numbers (similar to the documents in Exam- 
ple [TJ. Hence, for each natural numbeiQ k G N+, identified by u^ G U, exists an LD 
document a(iocinf(ufc) — d^ £ D^f such that (iaiainf((ifc) = {(it^, succ,Ufe+i)} where 
succ G U identifies the successor relation for N+. Furthermore, let Pi = {ui, succ, 7v) 
and P2 = {7x, succ, ?y) be SPARQL expressions. It can be seen easily that the result 
of SPARQLld query Q^^ over W\nf is finite, whereas, Q^'^(W\nf) is infinite. 

The example demonstrates that some SPARQLld queries have a finite result over some 
infinite Web of Linked Data and some queries have an infinite result. Consequently, 
we are interested in a decision problem Finiteness(SPARQLld) which asks, given a 
(potentially infinite) Web of Linked Data W and a satisfiable SPARQL expression P, 
whether Q^{W) is finite. Unfortunately, we cannot answer the problem in general; 

Theorem 3. Finiteness(SPARQLld) " not LD machine decidable. 



5 Reachability-Based Semantics 

Our results in the previous section show that SPARQL queries under full-Web seman- 
tics have a very limited computability. As a consequence, any SPARQL-based query ap- 
proach for Linked Data that uses full- Web semantics requires some ad hoc mechanism 
to abort query executions and, thus, has to accept incomplete query results. Depending 
on the abort mechanism the query execution may even be nondeterministic. If we take 
these issues as an obstacle, we are interested in an alternative, well-defined semantics 
for SPARQL over Linked Data. In this section we discuss a family of such seman- 
tics which we call reachability-based semantics. These semantics restrict the scope of 
queries to data that is reachable by traversing certain data links using a given set of URIs 
as starting points. Hereafter, we refer to queries under any reachability-based seman- 
tics as SPARQLiO(Rj queries. In the remainder of this section we formally introduce 
reachability-based semantics, discuss theoretical properties of SPARQLld(r) queries, 
and compare SPARQLld(R) to SPARQLld- 

5.1 Definition 

The basis of any reachability-based semantics is a notion of reachability of LD docu- 
ments. Informally, an LD document is reachable if there exists a (specific) path in the 



' In this paper we write N^ to denote the set of all natural numbers without zero. 



link graph of a Web of Linked Data to the document in question; the potential start- 
ing points for such a path are LD documents that are authoritative for a given set of 
entities. However, allowing for arbitrary paths might be questionable in practice be- 
cause this approach would require following all data links (recursively) for answering a 
query completely. Consequently, we introduce the notion of a reachability criterion that 
supports an explicit specification of what data links should be followed. 

Definition 9. Let T be the infinite set of all possible RDF triples and let V be the in- 
finite set of all possible SPARQL expressions. A reachability criterion c is a (Turing) 
computable function c : 7" x W x T' — > {true, false}. 

An example for a reachability criterion is cam which corresponds to the aforementioned 
approach of allowing for arbitrary paths to reach LD documents; hence, for each tuple 
{t,u,Q) ^ T X U X Q it holds c/\\\{t, u, Q) = true. The complement of caii is CMone 
which always returns false. Another example is CMatch which specifies the notion of 
reachability that we use for link traversal based query execution II10I12I . 

true if there exists a triple pattern tp in P and t matches tp, 

CMatch I " " ' 



,(t,u,p) 



false else. 



where an RDF triple t = (a;i, 2:2, 0:3) matches a triple pattern tp = (.ti, £2, a^a) if for 
all i G {1, 2, 3} holds: If Xi ^ V, then Xi = Xi. 

We call a reachability criterion ci less restrictive than another criterion C2 if i) for 
each {t,u,P) € T xU xV for which C2 {t, u, P) = true, also holds ci [t, u, P) = true 
and ii) there exist a (t',u',P') G T x U x V such that ci{t',u',P') = true but 
C2(t', u', P') = false. It can be seen that caii is the least restrictive criterion, whereas 
ciMone IS the most restrictive criterion. We now define reachability of LD documents: 

Definition 10. Let S d U be a finite set of seed URIs; let cbe a reachability criterion; 
let P be a SPARQL expression; and let W = [D, data, adoc) be a Web of Linked Data. 
An LD document d G D is (c, P)-reachable from S in W if either 

7. there exists a URI u £ S such that adoc{u) ~ d; or 

2. there exist d' (^ D, t G data{d'), andu G uris(t) such that i) d' is (c, P)-reachable 
from S in W, ii) adoc(u) = d, and Hi) c{t, u, P) = true. 

Based on reachability of LD documents we define reachable parts of a Web of Linked 
Data. Such a part is an induced subweb covering all reachable LD documents. Formally: 

Definition 11. Let S a lA be a finite set of URIs; let c be a reachability criterion; let 
P be a SPARQL expression; and let W = {D, data, adoc) be a Web of Linked Data. 
The {S, c, P)-reachable part of W, denoted by Wc ' , is an induced subweb {Dy^, 
data<}\, adoct)\) ofW such that Dsj\ = \^d E D\dis (c, P)- reachable from S in VFj. 

We now use the concept of reachable parts to define SPARQLld(R) queries. 

Definition 12. Let S <ZU be a finite set of URIs; let c be a reachability criterion; and 
let P be a SPARQL expression. The SPARQLi_d(R) query that uses P, S, and c, denoted 
by Qc'^ ' i'> o Linked Data query that, for any Web of Linked Data W, is defined as 
Sf'^(W^) = MAnData(M/<^-->) ^^''^'"^ ^-^^^^^ " ^'^^ {S,c, P)- reachable part ofW). 



As can be seen from Definition [T2l our notion of SPARQLld(r) consists of a family 
of (reachability-based) query semantics, each of which is characterized by a certain 
reachability criterion. Therefore, we refer to SPARQLld(r) queries for which we use a 
particular reachability criterion c as SPARQLld(R) queries under c-semantics. 

Definition [12] also shows that query results depend on the given set 5 C W of 
seed URIs. It is easy to see that any SPARQLld(R) query which uses an empty set of 
seed URIs is not satisfiable and, thus, monotonic and finitely computable. We therefore 
consider only nonempty sets of seed URIs in the remainder of this paper 

5.2 Completeness and Infiniteness 

Definition[T2]defines precisely what the sound and complete result of any SPARQLld(R) 
query Qf'^ over any Web of Linked Data W is. However, in contrast to SPARQLld, it 
is not guaranteed that such a (complete) SPARQLld(R) result is complete w.rt. all data 
on W. This difference can be attributed to the fact that the corresponding {S,c,P)- 
reachable part of W may not cover W as a whole. We emphasize that such an incom- 
plete coverage is even possible for the reachability criterion caii because the Unk graph 
of W may not be connected; therefore, CAii-semantics differs from full- Web semantics. 
The following result relates SPARQLld(R) queries to their SPARQLld counterparts. 

Proposition 3. Let Qf '■^ be a SPARQLld(rj query; let Q^ be the SPARQLld query that 
uses the same SPARQL expression as Qf ' ; let W be a Web of Linked Data. It holds: 

L IfQP is monotonic, then Qf'^(W^) C QP{W). 

2. Q^^\W) = Q^(M^i^'^'). (recall, P^i^'-^^ is the {S,c,P)-reachablepartofW) 

Since any SPARQLld query over a finite Web of Linked Data has a finite result (cf. 
Proposition|2]i, we use Proposition |3] case|2] to show the same for SPARQLld(r): 

Proposition 4. The result of any SPARQLiQ^g^^ query Q^'^ over a finite Web of Linked 
Data W is finite; so is the {S, c, P)- reachable part ofW. 

For the case of an infinite Web of Linked Data the results of SPARQLld(R) queries 
may be either finite or infinite. In Example |4] we found the same heterogeneity for 
SPARQLld- However, for SPARQLld(R) we may identify the following dependencies. 

Proposition 5. Let S Gli be a finite, nonempty set of URIs; let c and d be reachability 
criteria; and let P be a SPARQL expression. Let W be an infinite Web of Linked Data. 

1. WcNoL is always finite; so is Qf^f^T^)- 

2. IfW^'^'^^ is finite, then Q^''^{W) is finite. 

3. If Qf'^(W) is infinite, then Wc ' ' is infinite. 

4. If c is less restrictive than c' and Wc ' is finite, then W^, ' is finite. 

I s P~l ( s P) 

5. If d is less restrictive than c and Wc ' is infinite, then W^, ' is infinite. 

Proposition|5]provides valuable insight into the dependencies between reachability cri- 
teria, the (in)finiteness of reachable parts of an infinite Web, and the (in)finiteness 
of query results. In practice, however, we are primarily interested in answering two 
decision problems: FinitenessReachablePart and Finiteness(SPARQLld(R)). 



While the latter problem is the SPARQLld(r) equivalent to Finiteness(SPARQLld) 
(cf. Section|4J2]i, the former has the same input as Finiteness(SPARQLld(R) ) (that is, 
a Web of Linked Data and a SPARQLld(r) query) and asks whether the corresponding 
reachable part of the given Web is finite. Both problems are undecidable in our context: 

Theorem 4. FinitenessReachablePart and Finiteness(SPARQLld(r)) are 
not LD machine decidable. 



5.3 Satisfiability, Nontrivial SatisfiabiUty, Monotonicity, and ComputabiUty 

We now investigate satisfiability, nontrivial satisfiability, monotonicity, and computabil- 
ity of SPARQLld(r, queries. First, we identify the following dependencies. 

Proposition 6. Let Q^'^ be a SPARQLiD(R) query that uses a nonempty S <ZU. 

1- Qc'^ '■* satisfiable if and only if P is satisfiable. 

2- Qc'^ is nontrivially satisfiable if and only if P is nontrivially satisfiable. 

3- Qc'^ is monotonia if P is monotonic. 

Proposition |6] reveals a first major difference between SPARQLld(R) and SPARQLld: 
The statement about monotonicity in that proposition is only a material conditional, 
whereas it is a biconditional in the case of SPARQLld (cf. Proposition [T]i. The reason 
for this disparity are SPARQLld(R) queries for which monotonicity is independent of the 
corresponding SPARQL expression. The following proposition identifies such a case. 

Proposition 7. Any SPARQLid(rj query Qc£„^ is monotonic if \S\ ~ 1. 

Before we may come back to the aforementioned disparity, we focus on the computabil- 
ity of SPARQLld(r) queries. We first show the following, noteworthy result. 

Lemma 2. Let Q^'^ be a SPARQLiD(R) query that is nontrivially satisfiable. There 
exists an LD machine that computes Q^'^ over any (potentially infinite) Web of Linked 
Data W and that halts after a finite number of computation steps with an encoding of 
Q^'^CW) on its output tape if and only if the {S, c, P)-reachable part ofW is finite. 

The importance of Lemma |2] lies in showing that some computations of nontrivially 
satisfiable SPARQLld(R) queries may terminate. This possibility presents another ma- 
jor difference between SPARQLld(R) and SPARQLld (recall Lemma [T] which shows 
that any possible computation of nontrivially satisfiable SPARQLld queries never ter- 
minates). Based on Lemma|2]we may even show that a particular class of satisfiable 
SPARQLld(R) queries are finitely computable. This class comprises all queries that use 
a reachability criterion which ensures the finiteness of reachable parts of any queried 
Web of Linked Data. We define this property of reachability criteria as follows: 

Definition 13. A reachability criterion c ensures finiteness if for any Web of Linked 
Data W, any (finite) set S C lA of seed URIs, and any SPARQL expression P, the 
(S*, c, P)-reachable part ofW is finite. 

We may now show the aforementioned result: 



Proposition 8. Let c be a reachability criterion that ensures finiteness. SPARQLiQfK^ 
queries under c-semantics are finitely computable. 

While it remains an open question whether the property to ensure finiteness is decidable 
for all reachability criteria, it is easy to verify the property for criteria which always only 
accept a given, constant set of data links. For a formal discussion of such criteria, which 
we call constant reachability criteria, we refer to Appendix iDl C|\ione is a special case 
of these criteria; Proposition |5] case[T] verifies that CNone ensures finiteness. 

Notice, for any reachability criterion c that ensures finiteness, the computabiUty 
of SPARQLld(r) queries under c-semantics does not depend on the monotonicity of 
these queries. This independence is another difference to SPARQLld queries (recall 
Theorem[T]i. However, for any other reachability criterion (including cratch and caii), 
we have a similar dependency between monotonicity and computability of (satisfiable) 
SPARQLld(R) queries, that we have for SPARQLld queries (recall Theorem[T}: 

Theorem 5. Let Cnf be a reachability criterion that does not ensure finiteness. If a 
satisfiable SPARQLiD{R) query Q^'f (under Cnf -semantics) is monotonic, then Q^'f is 
either finitely computable or eventually computable; otherwise, Q^f niay not even be 
eventually computable. 

By comparing Theorems [T] and |5] we notice that SPARQLld queries and SPARQLld(R) 
queries (that use a reachability criterion which does not ensure finiteness) feature a 
similarly limited computability. However, the reasons for both of these results differ 
significantly: In the case of SPARQLld the limitation follows from the infiniteness of 
U, whereas, for SPARQLld(R) the limitation is a consequence of the possibility to query 
an infinitely large Web of Linked Data. 

However, even if the computability of many SPARQLld(R) queries is as limited 
as that of their SPARQLld counterparts, there is another major difference: Lemma 
shows that for (nontrivially satisfiable) SPARQLld(R) queries which are not finitely 
computable, the computation over some Webs of Linked Data may still terminate; this 
includes all finite Webs (cf. Proposition |4]i but also some infinite Webs (cf. proof of 
Lemma |2|i. Such a possibility does not exist for nontrivially satisfiable SPARQLld 
queries (cf. Lemma[T]i. Nonetheless, the termination problem for SPARQLld(r) is un- 
decidable in our context. 

Theorem 6. Termination(SPARQLld(R)) « not LD machine decidable. 

We now come back to the impossibility for showing that SPARQLld(r) queries (with a 
nonempty set of seed URIs) are monotonic only if their SPARQL expression is mono- 
tonic. Recall, for some SPARQLld(r) queries monotonicity is irrelevant for identifying 
the computability (cf. Proposition |8]l. We are primarily interested in the monotonicity 
of all other (satisfiable) SPARQLld(R) queries because for those queries computability 
depends on monotonicity as we show in Theorem|5] Remarkably, for those queries it is 
possible to show the required dependency that was missing from Proposition|6] 

Proposition 9. Let Q^f be a SPARQLid(r) query that uses a finite, nonempty S C U 
and a reachability criterion c„j which does not ensure finiteness. Q^'f is monotonic 
only if P is monotonic. 



6 Conclusions 

Our investigation of SPARQL as a language for Linked Data queries reveals the fol- 
lowing main results. Some special cases aside, the computability of queries under any 
of the studied semantics is limited and no guarantee for termination can be given. For 
reachability-based semantics it is at least possible that some of the (non-special case) 
query computations terminate; although, in general it is undecidable which. As a conse- 
quence, any SPARQL-based query system for Linked Data on the Web must be prepared 
for query executions that discover an infinite amount of data and that do not terminate. 

Our results also show that -for reachability-based semantics- the aforementioned 
issues must be attributed to the possibility for infiniteness in the queried Web (which is 
a result of data generating servers). Therefore, it seems worthwhile to study approaches 
for detecting whether the execution of a SPARQLld(r) query traverses an infinite path 
in the queried Web. However, the mentioned issues may also be addressed by another, 
alternative well-defined semantics that restricts the scope of queries even further (or 
differently) than our reachability-based semantics. It remains an open question how 
such an alternative may still allow for queries that tap the full potential of the Web. 

We also show that computability depends on satisfiability and monotonicity and that 
for (almost all) SPARQL-based Linked Data queries, these two properties directly cor- 
respond to the same property for the used SPARQL expression. While Arenas and Perez 
show that the core fragment of SPARQL without opt is monotonic (31, it requires fur- 
ther work to identify (non-)satisfiable and (non-)monotonic fragments and, thus, enable 
an explicit classification of SPARQL-based Linked Data queries w.rt. computabiUty. 
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Appendix 

The Appendix is organized as follows: 

- AppendixlAldescribes how we encode relevant structures (such as a Web of Linked 
Data and a valuation) on the tapes of Turing machines. 

- Appendix|B]provides a formal definition of SPARQL. 

- Appendix|C]contains the full technical proofs for all results in the paper. 

- AppendixiDjprovides a formal discussion of constant reachability criteria. 

A Encoding 

To encode Webs of Linked Data and query results on the tapes of a Turing machine 
we assume the existence of a total order ^u, ^g, ^£, and ^y for the URls in U, the 
blank nodes in B, the constants in C, and the variables in V, respectively; in all three 
cases ^x could simply be the lexicographic order of corresponding string representa- 
tions. Furthermore, we assume a total order -<t for RDF triples that is based on the 
aforementioned orders. 

For each u £ U, c E C, and v E V let cnc(u), enc(c), and enc(w) be the binary 
representation of u, c, and v, respectively. The encoding of a RDF triple t — {s,p, o), 
denoted by cnc(t), is a word ( cnc(s) , cnc(p) , cnc(o) ). 

The encoding of a finite set of RDF triples T = {ti, ... ,t„}, denoted by cnc(r), is a 
word ((enc(ti) , enc(i2) , ••• , cnc(t„) )) where the cnc(fi) are ordered as follows: For 
each two RDF triples tx,ty G T, cnc{tx) occurs before enc(tj,) in cnc(T) if t^ -<t ty. 

For a Web of Linked Data W = {D, data, adoc), the encoding of LD document 
d € D, denoted by cnc{d), is the word cnc{data{d)). The encoding of W itself, denoted 
by enc{W), is a word 

U enc(ui) enc(adoc(ui)) f| ... jjenc(ui) enc(adoc(ui)) U ... 

where ui, ...,Ui, ...is the (potentially infinite but countable) list of URIs in dom{adoc), 
ordered according to -<u. 

The encoding of a valuation /i with dom{fi) = {vi, ... , «„}, denoted by cnc(/i), is 
a word 

((cnc(fi) -> cnc(^(wi)) , ... , enc(u„) -^ cnc(/i(i;„)) )) 

where the enc{fi{vi)) are ordered as follows: For each two variables v^^Vy £ dom{fj.), 
enc{iJ.{vx)) occurs before cnc{fi{vy)) in enc(/x) if Ua; -<vVy 

Finally, the encoding of a (potentially infinite) set of valuations il — {/ii, fj.2, ■■■}, 
denoted by cnc(/2), is a word enc(/ii) enc(/i2) ... where the enc(/ii) may occur in any 
order 



B Formal Definition of SPARQL 

A SPARQL filter condition is defined recursively as follows: i) If ?x, ly £ V and 
c G {U VJ C) then Ix = c, ?x = ly, and bound(?a;) are filter conditions; ii) If Ri 
and i?2 are filter conditions then (-ii?i), (i?i AR2), and (i?i V R2) are filter conditions. 



Definition 14. A SPARQL expression is defined recursively as follows: 

1. A tuple {s,p, o) G (V U ZY) X (V U W) X (V U W U £) is a SPARQL expression. We 
call such a tuple a triple pattern. 

2. If Pi and P2 are SPARQL expressions, then (Pi ANDP2), {Pi UNION F2), and 
(Pi OPTP2) are SPARQL expressions. 

3. If P' is a SPARQL expression and R is a filter condition, then (P' filter R) is a 
SPARQL expression. 

Let 1^1 be a valuation and let P be a filter condition. We say /i satisfies R iff either 
i) Ris 7x = c, Ix G dom(/i) and fJ.{?x) = c; ii) R is ?.t = ?y, Ix, ly G dom(/.() and 
/x(?a;) = /^(?y); iii) R is bound(?a;) and Ix G doni(/i); iv) R is {-^Ri) and /i does not 
satisfy Pi; v) R is (Pi A P2) and ^ satisfies Pi and P2; or vi) P is (Pi V P2) and /i 
satisfies Pi or P2. 

Let ill, ^r and i7 be (potentially infinite but countable) sets of valuations; let P 
be a filter condition. The binary operations join, union, difference, and left outer-join 
between fii and Q^ are defined as follows: 

Ql ^ fir — {/^i yj ^r\ l^l & ^l and ^r G fir and /i; -^ /^r} 

J7i U i7r = {/i I /i G /?; or /^ G J7.r } 

^i \ /2r = {A'( e ^/ I V^r e fir : fll T^ fir} 
QiJAQr = i^l X ^r) U (J7( \ fir) 

<7fs{Q) ^ {fi E f2 \ iJ. satisfies P} 

Definition 15. Let P be a SPARQL expression and let G be a (potentially infinite but 
countable) set of RDF triples. The evaluation of P over G, denoted by |P]g. '■s defined 
recursively as follows: 

1. If P is a triple pattern tp, then 

|P]g = {fi\ fi is a valuation with doin(/i) — vars(ip) 
and fi[tp\ G G} 

2. IfP is (Pi ANDP2), then IPJG = IPiIg n iPaJc- 

3. IfP is (Pi UNION P2), then |P]g = [Pijc U IP21g- 

4. IfP is (Pi OPTP2), then |P]g = [Pi]g 3< |P2]g- 

5. //P /i (P' filterP), f/ien [PJg = (t«([P']g)- 

isac/i valuation fi G |P]g '■^ called a solution for P in G. 



C Proofs 

C.l Additional References for tlie Proofs 

[Pap93] C. H. Papadimitriou. Computational Complexity. Addison Wesley, 1993. 



C.2 Proof of PropositionlU Cased] 

For this proof we introduce a notion of lineage for valuations. Informally, the lineage 
of a valuation /i is the set of all RDF triples that are required to construct fi. Formally: 

Definition 16. Let P be a SPARQL expression and G be a (potentially infinite) set of 
RDF triples such that {PJg 7^ 0- For each fi G \P'\g we define the (P, G)-lineage of 
fi, denoted by Mn ' (n), recursively as follows: 

1. If P is a triple pattern tp, then lin ' (/i) — {/i[tp]|. 

2. If P is {Pi AND P2), then 

lin^^'^(Ai) = lin^^'^(Aii) Ulin^^'^(/i2) 

where /ii £ {Pile ond ^2 G I^21g such that fii ^ /12 and /i = /ii U /i2- Notice, 
/ii and ^2 must exist because fi G |P]g- 

3. If P is (Pi UNION P2), then 

W jlin^2,G(^^) i/3/X2 6[P2]G:M2-M- 

Notice, if fii does not exist then fi2 must exist because fi G I-PIg- 

4. If P is {Pi OPJ P2), then 

jj^p.G. ^ ^ flin^^'^(/ii) Ulin^^^^(Ai2) '/3ati,At2) e [Pi]g x {Pih : (a'i ^ A*2 A m = l^iUfi2), 
W jiij,P.,G(^,) /f3A.'G[PilG:(M' = MAVM*GlP2lG:/^VM')- 

Notice, either fii and fi2 or fi' must exist because fi G |P]g- 

5. IfP is (P' FILTER P), then lin^-^(Ai) = lin^'-^(/i') where /i' G [P'Jg such that 
fi = /x'. Notice, /i' mMif ejcwf because fi G |P1g- 

For any SPARQL expression P, any (potentially infinite) set G of RDF triples, and 
any valuation ^ G IPIg it can be easily seen that i) G" = lin (/i) is finite and 
ii) fi G |P]g'- We now prove Proposition [T] case[T] 

If: Let P be a SPARQL expression that is satisfiable. Hence, there exists a set of RDF 
triples G such that |P]g 7^ 0- W.l.o.g., let /i be an arbitrary solution for P in G, that 
is, /i G |P]g- Furthermore, let G" = lin (/-i) be the (P, G)-lineage of//. We use 
G' to construct a Web of Linked Data W^ = {D^, datUf^, adoc^) which consists of a 
single LD document. This document may be retrieved using any URl and it contains 
the (P, G)-lineage of jjl (recall that the lineage is guaranteed to be a finite). Formally: 

-Dp = {d} dataf^{d) = G' \f u E U : adoCf^{u) = d 

We now consider the result of SPARQLld query Q^ (which uses P) over W^. Obvi- 
ously, AllData(Vyp) = G' and, thus, Q^(VKp) = |P]g' (cf. Definition (8]). Since we 
know ^ G |P1g' it holds Q^{Wfj,) ^ 0, which shows that Q^ is satisfiable. 

Only if: Let Q^ be a satisfiable SPARQLld query that uses SPARQL expression P. 
Since Q^ is satisfiable, exists a Web of Linked Data W such that Q^{W^ 7^ 0. Since 
Q,^{W\ = [P]AiiData(w) (cf- Definition[8]), we conclude that P is satisfiable. 



C.3 Proof of PropositionlH Case |2] 

We prove case |2] of Proposition [T] using the same argumentation that we use in Sec- 
tion lC.2l for case[T] 

If: Let P be a SPARQL expression that is nontrivially satisfiable. Hence, there exists 
a set of RDF triples G and a valuation /.i such that i) /i G I-PIg and ii) dom(/i) ^ 0. 
Let G" = lin-^''^(^) be the (P, G)-lineage of ^. We use G' to construct a Web of 
Linked Data W^ = {Df^, datUf^, adoc^) which consists of a single LD document. This 
document may be retrieved using any URI and it contains the (P, G)-lineage of /i (recall 
that the lineage is guaranteed to be a finite). Formally: 

D^ = {d} data^{d) — G' VuCzU: adoc^{u) = d 

We now consider the result of SPARQLld query Q^ (which uses P) over VF^. Obvi- 
ously, AllData(Ty^) = G' and, thus, Q^{W^,) = |P]g' (cf. Definition (8]). Since we 
know jjL G |P]g' and doin(/x) ^ 0, we conclude that Q^ is nontrivially satisfiable. 

Only if: Let Q^ be a nontrivially satisfiable SPARQLld query that uses SPARQL 
expression P. Since Q^ is nontrivially satisfiable, exists a Web of Linked Data W 
and a valuation /_i such that i) /i G Q^{W^ and ii) dom(/i) ^ 0. Since Q^{W^ = 
[^]AiiData(H') (cf- Definition[8]), we conclude that P is nontrivially satisfiable. 

C.4 Proof of PropositionlU Case |3] 
If: Let: 

- P be a SPARQL expression that is monotonic; 

- Q^ be the SPARQLld query that uses P; and 

- Wi , Wi be an arbitrary pair of Webs of Linked Data such that W\ is an induced 
subweb of W2 ■ 

To prove that Q^ is monotonic it suffices to show Q^{W\) C Q^{W2). According 
to Definition [8] we have Q^(H/i) = [PlAiiData(i4'i) and Q^^Wi) = [P]AiiData(W2)- 
Since W^ is an induced subweb of Wi it holds AllData(W^i) C AllData(W2). We 
may now use the monotonicity of P to show lPlAiiData(M/'i) C [P]AiiData(VK2)- Hence, 

Only if: Let: 

- QF be a monotonic SPARQLld query that uses SPARQL expression P; and 

- Gi , G2 be an arbitrary pair of set of RDF triples such that G\ C G2. 

We distinguish two cases: either P is satisfiable or P is not satisfiable. In the latter case 
P is trivially monotonic. Hence, we only have to discuss the first case. To prove that 
(the satisfiable) P is monotonic it suffices to show |P]gi ^ I-P1g2- 

Similar to the proof for the other direction, we aim to use G\ and G2 for constructing 
two Webs of Linked Data W\ and Wi (where W\ is an induced subweb of W2) and then 
use the monotonicity of Q^ to show the monotonicity of P. However, since G\ and 



6*2 may be (countably) infinite we cannot simply construct Webs of Linked Data that 
consist of single LD documents which contain all RDF triples of Gi and G2 (recall, the 
data in each LD document of a Web of Linked Data must be finite). As an alternative 
strategy we construct Webs of Linked Data that consists of as many LD documents as 
we have RDF triples in Gi and G2 (which may be infinitely many). However, since 
the data of each LD document in a Web of Linked Data must use a unique set of blank 
nodes, we may lose certain solutions /i e 1^1 Gi by distributing the RDF triples from 
Gi over multiple LD documents; similarly for G2. To avoid this issue we assume i) a 
set Ub C U of new URIs not mentioned in G2 (i.e. Ub H tcrms(G2) = 0) and 
ii) a bijective mapping g : tcrms(G2) -^ {Ub U tcrms(G2) fl (W U £)) that, for any 
X G terms(G2), is defined as follows: 

, , { Qb{x) if a; e (tcrms(G2) nB), 
I X else. 

where qb '■ (tcrms(G2) CiB) ^ Ub is an arbitrary bijection that maps each blank node 
in G2 to a new, unique URI u G Ub- 

The application of g to an arbitrary valuation fi, denoted by g[^], results in a valu- 
ation fi' such that dom(/x') = doni{fj,) and /i'(?w) = g{fj,{7v)) for all 7v e dom(/i). 
Furthermore, the application of gi to an arbitrary RDF triple t = {xi,X2,X3), denoted by 
g[t], results in an RDF triple t' ~ {x[, x^^ x'-^) such that x'^ = g{xi) for all i G {1, 2, 3}. 
We now let G[ = {g[t\ | i £ Gi} and G^ = {g[t\ \ t e G2}. The following facts are 
verified easily: 

Fact 2. It holds G[ C G^, |Gi| = \G[\, and\G2\ = {G'^l 

Fact 3. For all j G {1,2} it holds: Let fi be an arbitrary valuation, then fi' = g[fi\ is a 
solution for P in G' if and only if fx is a solution for P in Gj. More precisely: 

VpelPhr- eM e iPh', and Vm' G {P\g'^ : g-'^] G {P\g, 

where g^^ denotes the inverse of the bijective mapping g. 

We now use G2 to construct a Web of Linked Data W2 = {D2,data2, adoc2) as fol- 
lows: D2 consists of IG2I LD documents, each of which contains a particular RDF 
triple from G^- Furthermore, we assume a set U2 of URIs, each of which corresponds 
to a particular RDF triple from Gj ; hence, U2 CU and | L/2 1 = | G2 1 . These URIs may 
be used to retrieve the LD document for the corresponding RDF triple. For a formal 
definition let dt^ denote the LD document for RDF triple ti G Gj and let ut^ denote the 
URI that corresponds to ti G G2. Then, we let: 

-D2 = [J dti data2{dt^) = {ti} Vut^ G U2 : adoc2{ut^) = d*. 

In addition to W2, we introduce a Web of Linked Data Wi ~ {Di, datai,adoci) that 
is an induced subweb of W2 and that is defined by Di = {dt^ G D2 \ U G G'l}. Recall, 
any induced subweb is unambiguously defined by specifying its set of LD documents. 
It can be easily seen that AllData(VKi) = G[ and AllData(VK2) = Gj. 



We now use Wi and W2 and the monotonicity of Q^ to show |P]gi ^ [^]g2 
(which proves that P is monotonic). W.l.o.g., let /i be an arbitrary solution for P in Gi, 
that is, /i g |P]gi • Notice, such a /i must exist because we assume that P is satisfiable 
(see before). To prove |-P|gi ^ [^]g2 it suffices to show /i G |P]g2- 

Due to FactElit holds £i[/i] e |P]g'^; and with AllData(W^i) = G[ and Definition[8] 
we have [PJc = iPlAnData(Wi) = Q^{Wi). Since Wi is an induced subweb of W2 
and Q^ is monotonic, it holds Q^{Wi) C Q^(W^2)- We now use AllData(W^2) = G'^ 
to show q[ij] G [-P]g^- Finally, we use Fact[3]again and find q~^ [gifj]] = M G Mg2- 

C.5 Proof of Theorem[T] 

To prove the theorem we first show that not any satisfiable SPARQLld query is finitely 
computable. Next, we study SPARQLld queries that are not monotonic and show that 
these queries are not eventually computable (and, thus, not computable at all). Finally, 
we prove that satisfiable, monotonic SPARQLld queries are eventually computable. 

To show that not any satisfiable SPARQLld query is finitely computable we use a con- 
tradiction, that is, we assume a satisfiable SPARQLld query Q^ which is finitely com- 
putable and show that this assumption must be false. If Q^ were finitely computable, 
there would be an LD machine that, for any Web of Linked Data W encoded on the 
Web tape, halts after a finite number of computation steps and produces a possible en- 
coding of Q^{W) on its output tape (cf. Definition |5]l. To obtain a contradiction we 
show that such a machine does not exist. However, for the proof we assume M were 
such a machine. 

To compute Q^ over an arbitrary Web of Linked Data W ^ {D, data, adoc) (which 
is encoded on the Web tape of M), machine M requires access to the data of all LD 
documents d £ D (Recall that Q^{W) = |PlAnData(W) where AllData(VK) = 
{data{d) \d G £*}). However, M may only access an LD document d £ D (and 
its data) after entering the expand state with a corresponding URI u £ U on the link 
traversal tape (i.e. for u it must hold adoc{u) = d). Initially, the machine has no in- 
formation about which URI(s) to use for accessing any d £ D. Hence, to ensure that 
all d G -D have been accessed, M must expand all u £ U. Notice, a real query system 
for the WWW would have to perform a similar procedure: To guarantee that such a 
system sees all documents, it must enumerate and lookup all URIs. However, since U 
is (countably) infinite, this process does not terminate, which is a contradiction to our 
assumption that M halts after a finite number of computation steps. Hence, satisfiable 
SPARQLld queries cannot be finitely computable. 

We now show by contradiction that non-monotonic SPARQLld queries are not even- 
tually computable. To obtain a contradiction we assume a (satisfiable) SPARQLld 
query Q^ that is not monotonic and an LD machine M whose computation of Q^ 
on any Web of Linked Data W has the two properties given in Definition |6] Let W = 
{D, data, adoc) be a Web of Linked Data such that Q^(W) y^ 0; such a Web exists 
because Q^ is satisfiable. Let W be encoded on the Web tape of AI and let fi be an ar- 
bitrary solution for Q^ in W; i.e. /i G Q^{W). Based on our assumption, machine M 
must write cnc(/i) to its output tape after a finite number of computation steps (cf. prop- 
erty|2]in Definition|6]l. We argue that this is impossible; Since Q^ is not monotonic, M 



Algorithm 1 The program of the (P)-machine. 



1: j := 1 

2: for ti e W do 

3: Call lookup for 11. 

4: Let Tj denote the set of all RDF triples currently encoded on the link traversal tape. Use 

the work tape to enumerate the set [P]t . 
5: For each n G [^]t check whether /i is already encoded on the output tape; if not, then 

add enc(/i) to the output. 
6: j := j + 1 
7: end for 



cannot add /i to the output before it is guaranteed that all d E D have been accessed. 
As discussed before, such a guarantee requires expanding all u G Z// because M has 
no a-priory information about W. However, expanding all m £ W is a non-terminating 
process (due to the infiniteness of U) and, thus, M does not write fi to its output after 
a finite number of steps. As a consequence, the computation of Q^(W^ by M does 
not have the properties given in Definition |6] which contradicts our initial assumption. 
This contradiction shows that non-monotonic SPARQLld queries are not eventually 
computable. 

In the remainder, we prove that satisfiable, monotonic SPARQLld queries are eventu- 
ally computable. For this proof we introduce specific LD machines which we call (P)- 
machines. Such a (P)-machine implements a generic (i.e. input independent) computa- 
tion of SPARQLld query Q^. We shall see that if a SPARQLld query Q^ is monotonic, 
the corresponding (P)-machine (eventually) computes Q^ over any Web of Linked 
Data. Formally, we define (P)-machines as follows: 

Definition 17. Let P be SPARQL expression. The {P)-machine is an LD machine 
that implements Algorithm\l\ This algorithm makes use of a special subroutine called 
lookup, which, when called with URI u £ U, i) writes enc(u) to the right end of the 
word on the link traversal tape, ii) enters the expand state, and Hi) performs the expand 
procedure as specified in Definition^ 

Before we complete the proof we discuss important properties of (P)-machines. As can 
be seen in Algorithm[Tl any computation performed by (P)-machines enters a loop that 
iterates over the set 14 of all possible URIs. As discussed before, expanding all u G U 
is necessary to guarantee completeness of the computed query result. However, since U 
is (countably) infinite the algorithm does not terminate (which is not a requirement for 
eventual computability). 

During each iteration of its main processing loop, a (P) -machine generates valua- 
tions using all data that is currently encoded on the link traversal tape. The following 
lemma shows that these valuations are part of the corresponding query result (find the 
proof for Lemma|3]below in Section |C6] i: 

Lemma 3. Let Q^ be a satisfiable SPARQLld query that is monotonic; let M^ denote 
the {P)-machine for SPARQL expression P used by Q^; and let W be an arbitrary Web 
of Linked Data encoded on the Web tape of M^. During the execution ofAlgorithm\J] 
by MP itholds^j e {1,2,...} : IP\t, C QP{W). 



We now use the notion of (P)-machines to prove that satisfiable, monotonic SPARQLld 
queries are eventually computable. Let Q^ be a satisfiable SPARQLld query that is 
monotonic and let W be an arbitrary Web of Linked Data encoded on the Web tape of 
the (P)-machine for Q^\ to denote this machine we write AI^ . W.l.o.g. it suffices to 
show that the computation of M^ on (Web) input enc(W^) has the two properties given 
in Definition|6] 

During the computation M^ only writes to its output tape when it adds (encoded) 
valuations ^ G IPJt- (for j = 1,2, ...). Since all these valuations are solutions for 
Q^ in W (cf . Lemma O and line |5] in Algorithm [T] ensures that the output is free of 
duplicates, we see that the word on the output tape is always a prefix of a possible 
encoding of Q^(W) . Hence, the computation of M^ has the first property specified in 
Definition |6l 

To verify that the computation also has the second property it is important to note 
that Algorithm [T] looks up no more than one URI per iteration (cf. line|3]l. Hence, (P)- 
machines prioritize result construction over data retrieval. This feature allows us to 
show that for each solution in a query result exists an iteration during which that solution 
is computed (find the proof for Lemma |4]below in Section lCTJ i: 

Lemma 4. Let Q be a satisfiable SPARQLid query that is monotonic; let M denote 
the {P)-machine for SPARQL expression P used by Q^; and let W be an arbitrary 
Web of Linked Data encoded on the Web tape of M^. For each p G Q^(W) exists a 
j^ G {1,2,...} such that during the execution of Algorithm\l\by M^ it holds Vj G 

0M,.7^ + l,...}:/ielPlT,. 

It remains to show that the computation of M^ definitely reaches each iteration of the 
processing loop after a finite number of computation steps. To prove this property we 
show that each iteration of the loop finishes after a finite number of computation steps: 

- The call of the subroutine lookup (cf. Definition [TTb in line [3] of Algorithm [T] 
terminates because the encoding ofW= {D, data, adoc) is ordered following the 
order of the URIs in dom(a(ioc). 

- At any point in the computation the word on the link traversal tape is finite because 
M^ only gradually appends (encoded) LD documents to the link traversal tape and 
the encoding of each document is finite (recall that the set of RDF triples data{d) 
for each LD document d is finite). Due to the finiteness of the word on the link 
traversal tape, each IPJt.- (for j = 1, 2, ...) is finite, resulting in a finite number of 
computation steps for line|4]during any iteration. 

- Finally, line|5]requires only a finite number of computation steps because the word 
on the link traversal tape is finite at any point in the computation; so is the word on 
the output tape. 

C.6 Proof of Lemma |3] 

Let: 

- Q^ be a SPARQLld query that is satisfiable and monotonic; 

- M^ denote the (P) -machine for SPARQL expression P used by Q^; 



- W he a Web of Linked Data which is encoded on the Web tape of M^. 

To prove Lemma[3]we use the following result. 

Lemma 5. During the execution ofAlgorithm\J}by M^ on (Web) input cnciW) it holds 
V j e {1, 2, ...} : Tj C AllData(M^). 

Proof of Lemma HJ The computation of M^ starts with an empty link traversal tape 
(cf. Definition m. Let Wj be the word on the link traversal tape of M^ before M^ 
executes line |4] during the j-th iteration of the main processing loop in Algorithm [1] 
It can be easily seen that for each Wj (where ?' G {1,2,...}) exists a finite sequence 
ui, ... ,Uj of j different URIs such that i) Wj i^ 

eiic{ui) enc{adoc(ui)) jj ... \j,ciic(uj) enc{adoc{uj)) [I 

and ii) for each i G [l,j] either Ui ^ dom(adoc) or adoc{ui) G D. If Uj is the set that 
contains all URIs in this sequence ui, ... , Uj, it holds Tj — {data {adoc{ui)) | Ui e 
Uj and Ui G dom(adoc)}. Clearly, Tj C AllData(W^). D 

Due to the monotonicity of Q^ it is trivial to show Lemma |3] using Lemma |5] (recall 

QP{W) = [P]AnData(W')). 

C.7 Proof of LemmaH 
Let: 

- Q^ be a SPARQLld query that is satisfiable and monotonic; 

- Af ^ denote the (P) -machine for SPARQL expression P used by Q^; 

- W hea Web of Linked Data which is encoded on the Web tape of M^. 

To prove Lemma|4]we use the following result. 

Lemma 6. For each RDF triple t G AllData(W^) exists a jt G {1, 2, ...} such that 
during the execution of Algorithm\I]by M^ on (Web) input enc(VF) it holds \/ j G 
{jt,jt + l,...}:tGT,. 

Proof of LemmaHl W.l.o.g., let t' be an arbitrary RDF triple t' G AllData(W^) ; hence, 
there exists an LD document d E D such that t' G data{d). Let d' be such an LD 
document. Since mapping adoc is surjective (cf. Definition[T]i, exists a URI u eU such 
that adoc{u) = d' . Let u' be such a URI. Since u' E U exists a j' G {1, 2, ...} such 
that M^ selects u' for processing in the j'-th iteration of the main loop in Algorithm[T] 
After completing the lookup of u' during this iteration (cf. line |3] in Algorithm [T}, the 
word on the link traversal tape contains sub-word en.c{d') (cf. Definitions [TTl and |4]|. 
Since t' G data{d'), this word enc((i') contains sub-word enc(i') (cf. Appendix lAll. 
Hence, t' G Tji . Since (P)-machines only append to (the right end of) the word on 
the link traversal tape, M^ will never remove cnc(t') from that tape and, thus, it holds 



' We assume enc(adoc(ui)) is the empty word if Ui ^ doin(adoc). 



We now prove Lemma |4]by induction over the structure of possible SPARQL expres- 
sions. 

Base case: Assume that SPARQL expression P is a triple pattern tp. W.l.o.g., let fj, S 
Q^{W). It holds dom(/i) = vars(fp) and t = fi[tp] G AllData(VF) (cf. Definitions [8] 
andfTsTl. According to Lemmal6lexists a ?',, e {1, 2, ...} suchthat Vj £ {.7/i, J/i+l, •••} : 
t G Tj. Since Q^ is monotonic we conclude V j £ {jfj.,j^i + i, •■•} : M G I-PIt ■ 

Induction step: Our inductive hypothesis is that for SPARQL expressions Pi and P2 it 
holds: 

1. For each /i G Q^^{W) exists a j^ G {1, 2, ...} such that during the execution of 
AlgorithmlUby M^ it holds Vj G {j^,jp + l, ...} : a^ e [Pi It,; and 

2. For each /i G Q^^(W) exists a j^ G {1, 2, ...} such that during the execution of 
AlgorithmlUby M^ it holds V j G {j^„j,, + 1, ...} : ^l e [PsIt, • 

Based on this hypothesis we show that for any SPARQL expression P that can be 
constructed using Pi and P2 it holds: For each jj. G Q^{W) exists a j^ G {1, 2, ...} 
such that during the execution of Algorithm [T]by M^ it holds V j G {j^,jf_i + 1, ...} : 
fi G |P]tj. W.l.o.g., let ^' G Q^(M^). According to Definition [l4l we distinguish the 
following cases: 

- P is (Pi ANDP2). In this case exist fii G Q^^{W) and ^2 G Q^'^{W) such that 
fi' — fiiU fj,2 and /^i '^ /^2- According to our inductive hypothesis exist j^^ , j^^ G 
{l,2,...}suchthati)Vj G {j^,, j^,+l, ...} : Mi G I^It, andii)Vj G {3^,^,j^.^+ 
1, ••■} : M2 G I^21t_, ■ Let j^, = max({j^j , j^, }) . Due to the monotonicity of Q^ 
it holds Vj G {v,v + l,...} : m' G [PIt,. 

- P is (Pi FILTER P). In this case exist ^l* G Q^^{W) such that m' = /_**. According 
to our inductive hypothesis exist j^. G {1, 2, ...} suchthat Vj G {.?/j*, j/j*+l, •■•} : 
/x* G |Pi]t. Due to the monotonicity of Q-^ it holds Vj G {7^*, jp*+l, ...} : /i' G 

- P is (Pi OPTP2). We distinguish two cases: 

1. There exist /xi G Q^\W) and/X2 G Q^\W) such that /x' = /X1UM2 and^i ~ 
^2- This case corresponds to the case where P is (Pi and P2) (see above). 

2. There exist m G Q^^{W) such that ^x' = m and Vm2 G Q^\W) : m 7^ ^2- 
According to our inductive hypothesis exist j^^ G {1,2,...} such that V j G 
{jfj.i,jfj.i +!,...} : A^i G [PiJtj- Due to the monotonicity of Q^ it holds 
Vj-GU,,,j^,+l,...}:Ai'G[P]T,. 

- P is (Pi UNION P2). We distinguish two cases: 

1. There exists /i* G Q^^(VK) such that fi' — fi*. According to our inductive 
hypothesis exist j^. G {1,2,...} such that 

VjG{j,,*,j^.+l,...}:/i*G[Pi]T,. 

2. There exists ji* G Q^^(VF) such that fi' ~ /i*. According to our inductive 
hypothesis exist j^. G {1,2,...} such that 
VjG{.7^.,j^.+1,...}:M*G[P2]t,. 

Due to the monotonicity of Q^ it holds for both cases: V j G {7^* , j^* + 1, ...} : 
/''g[P]t,. 



C.8 Proof of LemmaH] 

We prove the lemma by contradiction, that is, we assume a nontrivially satisfiable 
SPARQLld query Q^ for which exists an LD machine that, for some Web of Linked 
Data W encoded on the Web tape, halts after a finite number of computation steps and 
produces an encoding of Q^(W) on its output tape. To obtain a contradiction we show 
that such an LD machine and such a Web of Linked Data does not exist. However, for 
the proof we assume AI' were such a machine and W' ~ {D' , data' , adoc') were such 
a Web of Linked Data. 

Since Q^ is nontrivially satisfiable it is possible that W' is a Web of Linked Data 
for which exist solutions in Q^{W') such that each of these solutions provides a bind- 
ing for at least one variable. Hence, for computing Q^ over W' completely, machine 
AI' requires access to the data of all LD documents d E D' (recall that Q^{W'^ = 
I-P]AiiData(w^') where AllData(l^') = {data'{d) \ d e D'}). However, M' may ac- 
cess an LD document d G D' (and its data) only after performing the expand procedure 
with a corresponding URI m G W on the link traversal tape (i.e. for u it must hold 
adoc'{u) = d). Initially, the machine has no information about which URJ(s) to use for 
accessing any d E D'. Hence, to ensure that all d E D' have been accessed, M' must 
expand all u <E U. Notice, a real query system for the WWW would have to perform a 
similar procedure: To guarantee that such a system sees all documents, it must enumer- 
ate and lookup all URIs. However, since U is (countably) infinite, this process does not 
terminate, which is a contradiction to our assumption that AI' halts after a finite number 
of computation steps. 

C.9 Proof of TheoremlH 

We formally define the termination problem for SR\RQLld as follows: 



Problem: Termination(SPARQLld) 

Web Input: a Web of Linked Data W 

Ordinary Input: a satisfiable SPARQLld query Q^ 

Question: Does an LD machine exist that computes Q^(W) and halts? 



We show that Termination(SPARQLld) is not LD machine decidable by reducing 
the halting problem to Termination(SPARQLld)- 

The halting problem asks whether a given Turing machine (TM) halts on a given 
input. For the reduction we assume an infinite Web of Linked Data VKjms which we 
define in the following. Informally, Wjms describes all possible computations of all 
TMs. For a formal definition of Wjms we adopt the usual approach to unambiguously 
describe TMs and their input by finite words over the (finite) alphabet of a universal 
TM (e.g. [Pap93]). Let W be the countably infinite set of all words that describe TMs. 
For each w G W let AI{w) denote the machine described by w, let d"'^ denote the 
computation of AI(w) on input x, and let u™'^ denote a URI that identifies c™'^. Fur- 
thermore, let w™'^ denote a URI that identifies the i-th step in computation c™'^. To 
denote the (infinite) set of all such URIs we write WjMsteps- Using the URIs WjMsteps 
we may unambiguously identify each step in each possible computation of any TM on 



any given input. However, if a URI u ^U could potentially identify a computation step 
of a TM on some input (because u adheres to the pattern used for such URIs) but the 
corresponding step may never exist, then u ^ ZYjMsteps- For instance, if the computa- 
tion of a particular TM M{wj) on a particular input Xk halts with the i'-th step, then 
Vi e {l,...,i'} : u^''"^" ei^TMstepsandVi £ {i' + l,...} : wf"'^" ^ iYjMsteps- Notice, 
while the set WjMsteps is infinite, it is still countable because i) W is countably infinite, 
ii) the set of all possible input words for TMs is countably infinite, and iii) i is a natural 
number 

We now define Wjms as a Web of Linked Data {Djms, datajMs, adocjMs) with 
the following elements: I^tms consists of iWjMstepsI different LD documents, each of 
which corresponds to one of the URIs in WjMsteps (and, thus, to a particular step in a 
particular computation of a particular TM). Mapping adocjMs is bijective and maps 
each uf'^ e ^TMsteps to the corresponding d!^'^ e I^tms {dom{adocTMs) = WjMsteps)- 
We emphasize that mapping adocjMs is (Turing) computable because a universal TM 
may determine by simulation whether the computation of a particular TM on a partic- 
ular input halts before a particular number of steps (i.e. whether the i-th step in com- 
putation c"''^ for a given URI uf'^ may actually exist). Finally, mapping datajMs is 
defined as follows: The set dataTMs{d^'^) of RDF triples for an LD document d™'^ is 
empty if computation c™'^ does not halt with the i-th computation step. Otherwise, 
datajMsidf'^) contains a single RDF triple (w"''^,type, TerminatingComputation) 
where type G U and TerminatingComputation G U. Formally: 

{(m™'^, type, TerminatingComputation)} if computation c™'^ 

, ,w x\ I halts with the i-th 

dataTMs[di ' ) = ■{ 

else. 

Mapping datajMs is computable because a universal TM may determine by simulation 
whether the computation of a particular TM on a particular input halts after a given 
number of steps. 

We now reduce the halting problem to Termination(SPARQLld)- The input to 
the halting problem is a pair (w, x) consisting of a TM description w and a possible 
input word x. For the reduction we need a computable mapping / that, given such a 
pair {w,x), produces a tuple {W, Q^) as input for Termination(SPARQLld)- We 
define / as follows: Let {w,x) be an input to the halting problem, then f{w,x) = 
{WjMs, Q^™'==) withPw.x = (m™'^, type, TerminatingComputation). Given that VKtms 
is independent of {w, x), it is easy to see that / is computable by TMs (including LD 
machines). 

We emphasize that for any possible Q^™.^ it holds: 

{{/i0 } if the computation of TM M{w) on input x halts, 
step, 
else. 

where fi0 is the empty valuation with dom(/.t0) = 0. Hence, any Q^"--^ is satisfiable 
but not nontrivially satisfiable. 



To show that Termination(SPARQLld) is not LD machine decidable, suppose 
it were LD machine decidable. In such a case an LD machine could answer the halting 
problem for any input {w, x) as follows: M{w) halts on x if and only if an LD machine 
exists that computes Q^'"''{Wtms) = {^0} and halts. However, we know the halting 
problem is undecidable for TMs (which includes LD machines). Hence, we have a con- 
tradiction and, thus, Termination(SPARQLld) cannot be LD machine decidable. 

CIO Proof of Proposition |2] 

Let W = {D, data, adoc) be a finite Web of Linked Data. For each SPARQLld query 
Q^ it holds Q-^{W) = |i^]AiiData(w) (cf- Definition [8]l. To prove the proposition it 
suffices to show |-P|AiiData(VF) is finite for any possible SPARQL expression P. We 
use induction over the structure of SPARQL expressions for this proof: 

Base case: Assume that SPARQL expression P is a triple pattern tp. In this case 
(cf. DefinitionfTSll 

I^lAiiData(W) — {l^\ fJ-is a Valuation with dom(/^) ~ vars{tp) and 

^i[tp] e AllData(M/)} 

Since W (and, thus, D) is finite and for all d E D the set data{d) is finite, there exist 
only a finite number of RDF triples in AllData(VF) = Ude_D data{d). Hence, there 
can be only a finite number of different valuations ^ with ij\tp] G AllData(VK) and, 
thus, |P]AiiData(w) must be finite. 

Induction step: Our inductive hypothesis is that for SPARQL expressions Pi and P2, 
[A]AiiData(w) and |P2lAUData(W) is finite, respectively. Based on this hypothesis we 
show that for any SPARQL expression P that can be constructed using Pi and P2 
(cf. DefinitionO, it holds |-P|AUData(W) is finite. According to DefinitionfT4lwe dis- 
tinguish the following cases: 

- Pis (Pi AND P2). In this case iPlAllData(W) = I-PllAnData(W) "X [^2]AllData(W)- 

The resuh of the join may contain at most | [PilAUData(w) | • | [-P2]AiiData(w) | el- 
ements, which is a finite number because lPi]AiiData(w) and lP2lAiiData(W) are 
finite. 

- Pis (Pi UNION P2). In this case |P]AUData(W-) = [-Pl]AllData(H')U|P2]AllData(Vl/)- 

The result of the union may contain at most | |Pi]AiiData(H') | + | I-P2lAiiData(VK) | 
elements, which is a finite number because |Pi]AiiData(H') and |P2lAiiData(iv) are 
finite. 

- Pis (Pi OPTP2). In this case lPlAllData(M-') = [^llAnData(W) >> [^2]AllData(W)- 

The result of the left outer join contains at most | iPilAiiData(W) | • | [-P2I AiiData(W) | 
elements, which is a finite number because lPi]AiiData(H') and lP2lAiiData(W) are 
finite. 

- P is (Pi FILTER R). In this case |P]AUData(»') = CTi?(I-PilAUData(W))- The result 
of the selection may contain at most I |PilAiiData(W) I elements, which is a finite 
number because iPilAUData(W) is finite. 



C.ll Proof of Theorem|3] 

We fonnally define the finiteness problem for SPARQLld as follows: 



Problem: Finiteness(SPARQLld) 

Web Input: a (potentially infinite) Web of Linked Data W 

Ordinary Input: a satisfiable SPARQL expression P 

Question: Is the result of (the satisfiable) SPARQLld query Q^ over W finite? 



We show that Finiteness(SPARQLld) is not LD machine decidable by reducing the 
halting problem to Finiteness(SPARQLld)- While this proof is similar to the proof 
of Theorem|2](cf. Section |C9] l. we now use a Web of Linked Data Wtms2 which differs 
from Wtms in the way it describes all possible computations of all Turing machines 
(TM). 

For the proof we use the same symbols as in Section |C9| That is, W denotes the 
countably infinite set of all words that describe TMs. M{w) denote the machine de- 
scribed by w (for all w G W); c""-^ denotes the computation of A/(w) on input x; 
^w,x g ^ identifies the i-th step in computation c"''^. The set of all these identifiers is 
denoted by WjMsteps (recall that, although WjMsteps is infinite, it is countable). 

We now define Wtm52 as a Web of Linked Data (-Dtms2, datajMsi, adocjMsi) sim- 
ilar to the Web Wjms used in Section IC.9I DjmsI and a(iocTMs2 are the same is in 
W^TMs- That is, -Dtms2 consists of iWjMstepsI different LD documents, each of which cor- 
responds to one of the URIs in WjMsteps- Mapping adocjMsi is bijective and maps each 
uf'^ £ Z^TMsteps to the corresponding df'^ e -Dtm52- Mapping datajusi for Wtms2 
is different from the corresponding mapping for M^tms: The set datajMsiid^'^) of 
RDF triples for each LD document d^''^ contains a single RDF triple (u.f '^, first, m^'^) 
which associates the corresponding computation step u"'^ with the first step of the 
corresponding computation c'"-^ (first e U denotes a URI for this relationship). 

Before we come to the reduction we highlight a property of Wtms2 that is impor- 
tant for our proof. Each RDF triple (uf'^, first, uj"'^) establishes a data link from dj"'^ 
to d™'^. Hence, the link graph of Wtm52 consists of an infinite number of separate 
subgraphs, each of which corresponds to a particular computation d"'^, is weakly con- 
nected, and has a star-like form where the corresponding d™'^ is in the center of the star. 
More precisely, for subgraph [V^^ '^'= , E'^^ '^'= ) that corresponds to computation c^^ '^*' 
it holds 



and 



yw,,x^ = {df ^^ e DjMs2 \w = wj and x = Xk] 



E^,,xk ^Yw,,xu X {dp^^"}. 



Each of these subgraphs is infinitely large (i.e. has an infinite number of vertices) if and 
only if the corresponding computation halts. 

For the reduction we use mapping / which is defined as follows: Let w be the 
description of a TM, let a; be a possible input word for M{w), and let ?i; S V be a 
query variable, then /(w;,a;) = (VFTMs2,^ii;,a;) with P^^a; = (Tw, first, m^"'"^)- Given 



that Wtm52 is independent of {w, x), it is easy to see that / is computable by TMs 
(including LD machines). 

To show that Finiteness(SPARQLld) is not LD machine decidable, suppose it 
were LD machine decidable. In such a case an LD machine could answer the halting 
problem for any input {w, x) as follows: M{w) halts on x if and only if Q^™°=( Wtms2) 
is finite. However, we know the halting problem is undecidable for TMs (which includes 
LD machines). Hence, we have a contradiction and, thus, Finiteness(SPARQLld) 
cannot be LD machine decidable. 



C.12 Proof of Propositions CaseH 
Let: 

- Q^ be a SPARQLld query that is monotonic; 

- Qc'^ t)2 a SPARQLld(R) query that uses the same SPARQL expression P as Q^ 
(and an arbitrary reachability criterion c and (finite) set 5 C W of seed URIs); 

- W hea Web of Linked Data; and 

- W^'^'^^ denote the (S*, c, P)-reachable part of W. 

W.l.o.g. it suffices to show: Q^'^{W) C QP{W). 

Since Wc ' is an induced subweb of W (cf. Definition [TTI ) and Q^ is monotonic, 
it holds Q^(wi^'^^) C Q^(VF). Furthermore, we have Q^'%W) = Q^(wi^'^^) 
(cf. Propositions case|2li. Hence, Qf"^(W^) C QP{W). 

C.13 Proof of Proposition^ Case|2] 
Let: 

- Qc'^ t)6 a SPARQLld(R) query (that uses SPARQL expression P, reachability cri- 
terion c, and (finite) set 5 C W of seed URIs); 

- Q^ be a SPARQLld query that uses the same SPARQL expression P as Qf '^; 

- M^ be an arbitrary Web of Linked Data; and 

- W^'^'^^ denote the (5, c, P)-reachable part of W. 

It holds: 

- Q^^'iW) = [P]AUData(H^<«.-') ('^f' Definition[I3 and 

- Q^'iwl'-''^) = MAUData(wF'"') (cf- Definition®. 

Hence, Qf'^(P^) = Q^(W^f'^)). 



C.14 Proof of Proposition!!] 

Let: 

- Qf "^ be a SPARQLld(R) query (that uses SPARQL expression P, reachability cri- 
terion c, and (finite) set 5 C W of seed URIs); 

- W — {D, data, adoc) be a finite Web of Linked Data; and 

- Wc ' = {Dyi,datas\, adocyC) be the (5, c, P)-reachable part of W. 

W.l.o.g. it suffices to show: Q^'^iW) is finite and Wc ' is finite. 

W is finite, which means D is finite. Therefore, any subset of D must also be finite; 

this includes Dy^ C D because Wc ' is an induced subweb of W (cf. DefinitionfTTTl. 

(s P) 
Hence, Wc ' is finite. 

The finiteness of Qc'^{W) follows directly from the finiteness of Wc ' (and is 

independent of the finiteness of W) as the following lemma shows. 

Lemma 7. For any SPARQLl^k) query Q^'^ and any (potentially infinite) Web of 
Linked Data W it holds: IfWc ' is finite, then Q^'^{W) is finite. 

Proof of Lemma|7j The lemma follows directly from Propositions|3](case|2) and|2] D 



C.15 Proofof Proposition |5] 
Let: 

- P be a SPARQL expression; 

- c and c' be reachability criteria; 

- 5 C W be a finite but nonempty set of seed URIs; 

- W = {D, data, adoc) be an infinite Web of Linked Data. 

HI W^LP is always finite; so is Qf,'f,(W^). 

(S P) 

Let Dyi denote the set of all LD documents in Wcn„^^ . Since CNone always returns 
false it is easily verified that there is no LD document d G D that satisfies case |2] 
in Definition [TO] Hence, it must hold Dyi ~ {d G D\u G S and adoc{u) = d} 
(cf. case[T]in DefinitionfToli. Since S is finite we see that Dyi is guaranteed to be finite 
(and so is Wc^^^^ ). The finiteness of Q^f^(W^) can then be shown using Lemma [T] 
(cf. Section lcSl i. 

|2l If W^^''^^ is finite, tli en Q^ ^^W) is finite. 

See Lemma|7]in Section lC.141 

S If Qf ^(W^) is infinite, tlien M^i^^^ is infinite. 

Let Q^'^(W) be infinite. We prove by contradiction that Wc ' is infinite: Sup- 
pose Wc ' were finite. In this case Qf'^(VF) would be finite (cf. Lemma|2]in Sec- 
tion |C.14T i. which is a contradiction to our premise. Hence, Wc ' must be infinite. 



|4j If c is less restrictive than c' and Wc is finite, then W^, ' is finite. 

If Wc ' is finite, then exist finitely many LD documents d £ D that are (c, P)-reach- 
able from S in W. A subset of them is also (c', P) -reachable from S inW because c is 
less restrictive than c'. Hence, W^, ' must also be finite. 

lU If c' is less restrictive than c and Wc is infinite, then W^:, ' is infinite. 

If Wc ' is infinite, then exist infinitely many LD documents d £ D that are (c, P)- 

reachable from S in W. Each of them is also (c', P)-reachable from S inW because c' 

(s P) 
is less restrictive than c. Hence, W , ' must be infinite. 



C.16 Proof of TheoremH 

We formally define the decision problems FinitenessReachablePart and Finite- 
NESS(SPARQLld(R)) as follows: 



Problem: FinitenessReachablePart 

Web Input: a (potentially infinite) Web of Linked Data W 

Ordinary Input: a finite but nonempty set S CU of seed URIs 

a reachability criterion c that is less restrictive than CNone 

a SPARQL expression P 
Question: Is the {S, c, P)-reachable part of W finite? 



Problem: Finiteness(SPARQLld(r)) 

Web Input: a (potentially infinite) Web of Linked Data W 

Ordinary Input: a finite but nonempty set S CU of seed URIs 

a reachability criterion c that is less restrictive than CNone 

a SPARQL expression P 
Question: Is the result of SPARQLld(R) query Qf '^ over W finite? 



We now prove Theorem |4] by reducing the halting problem to FinitenessReach- 
ablePart and Finiteness(SPARQLld(R))- While this proof is similar to the proofs 
of Theorem|2](cf. Section |C9] l and Theorem|3](cf. Section lC.lll i. we now use a Web of 
Linked Data VKtms3 which (again) differs from Wjms and VFtms2 in the way it describes 
all possible computations of all Turing machines (TM). 

For the proof we use the same symbols as in Section |C9| That is, W denotes the 
countably infinite set of all words that describe TMs. M{w) denote the machine de- 
scribed by w (for all w G W); d"'^ denotes the computation of M{w) on input x\ 
^w,x g ^ identifies the i-th step in computation c"""^. The set of all these identifiers is 
denoted by iYjMsteps (recall that, although iYjMsteps is infinite, it is countable). 

We now define WtmsS as a Web of Linked Data (Dtms3, datajMsS, adocjiviss) sim- 
ilar to the Web Wtms used in Section IC.9I I?tms3 and adocxMsS are the same is in 
W^TMs- That is, -DjmsS consists of iWjMsteps | different LD documents, each of which cor- 
responds to one of the URIs in WjMsteps- Mapping adocjMsS is bijective and maps each 
^w,x g ^j^^^gp^ to the corresponding d"^'^ e -DjmsS- Mapping datajMsS for Wtms3 is 
different from the corresponding mapping for Wtms^ The set datajM53{d^'^) of RDF 



triples for an LD document (f"'^ is empty if computation c'"'^ halts with the i-th compu- 
tation step. Otherwise, datajMs3{di''^) contains a single RDF triple {uf'^, next, uf^^) 
which associates the computation step u™'^ with the next step in d"-^ (next G U denotes 
a URI for this relationship). Formally: 

if computation c""'^ halts 
datajMsi {df'^) = \ with the i-th step, 

.IK ,next,u,+i)} else. 

Mapping datajMsS is (Turing) computable because a universal TM may determine by 
simulation whether the computation of a particular TM on a particular input halts after 
a given number of steps. 

Before we come to the reduction we highlight a property of WtmsS that is important 
for our proof. Each RDF triple (it™'^, next, w^'J) establishes a data Unk from d^'^ to 
d^_^i . Due to such links we recursively may reach all LD documents about all steps in 
a particular computation of any TM. Hence, for each possible computation c™'^ of any 
TM M{w) we have a (potentially infinite) simple path (d™''^, ... ,<i"''^ ...) in the link 
graph of WtmsS- Each of these paths is finite if and only if the corresponding compu- 
tation halts. Finally, we note that each of these paths forms a separate subgraph of the 
link graph of WjmsS because we use a separate set of step URIs for each computation 
and the RDF triples in the corresponding LD documents only mention steps from the 
same computation. 

For the reduction we use mapping / which is defined as follows: Let w be the 
description of a TM, let a; be a possible input word for A/(w), and let ?a, ?6 G V be 
two distinct query variables, then /(w, x) = (VFtmsS, S^.^^, CMAc'n.Pw.x) with Syj^^ = 
{u™'^} and Pyj^x = (?a, next, lb). Given that cratch and VFtmsS are independent of 
{w, x), it can be easily seen that / is computable by TMs (including LD machines). 

To show that FinitenessReachablePart is not LD machine decidable, sup- 
pose it were LD machine decidable. In such a case an LD machine could answer the 
halting problem for any input (u',.t) as follows: AI{w) halts on x if and only if the 
{Sw,x, CMatch, ^M.,2-)"reachable part of VKtmsS is finite. However, we know the halting 
problem is undecidable for TMs (which includes LD machines). Hence, we have a con- 
tradiction and, thus, FinitenessReachablePart cannot be LD machine decidable. 

The proof that Finiteness(SPARQLld(R)) is undecidable is similar to that for 
FinitenessReachablePart. Hence, we only outUne the idea: Instead of reducing 
the halting problem to FinitenessReachablePart based on mapping / we now re- 
duce the halting problem to FinitenessQueryResult using the same mapping. If 
Finiteness(SPARQLld(r)) were decidable then we could answer the halting prob- 
lem for any {w, x): AI{w) halts on x if and only if Qcuit'^u "''"{Wjmss) is finite. 

C.17 Proof of Proposition|6l CaseH 

This proof is similar to the proof of Proposition [T] case[T](cf. Section lCZl i. 

If: Let P be a SPARQL expression that is satisfiable and let Q^'^ be an arbitrary 
SPARQLld(R) query that uses P, a nonempty set 5 C W of seed URIs, and an arbi- 
trary reachability criterion c. W.l.o.g. it suffices to show that Q^'^ is satisfiable. 



For this proof we use the notion of {P, G)-lineage of valuations that we introduced 
before (cf. Definition [T6l in Section ICZt . Recall that for any SPARQL expression P, 
any (potentially infinite) set G of RDF triples, and any valuation /_* G iPja it holds: 
i) G' = lin^'^(^) is finite and ii) ^i e {Pjc- 

Due to the satisfiability of P exists a set of RDF triples G such that |P]g 7^ <2S. 
W.l.o.g., let /i be an arbitrary solution for P in G, that is, /i G iPja- Furthermore, let 
G' = liii ' (/i) be the (P, G)-lineage of fi. We use G' to construct a Web of Linked 
Data Wfj, — (Dfj^, data^^,adoc^) which consists of a single LD document. This doc- 
ument may be retrieved using any URI from the (nonempty) set S and it contains the 
(P, G')-lineage of ^ (recall that the lineage is guaranteed to be a finite). Formally: 

D^ — {d\ datafj,{d) = G' \/u £ S : adoc^{u) = d 

Due to our construction it holds AllData(Wp) = AllData(WiH) — G' where Wyt de- 
notes the (S, c, P)-reachable part of W^. Thus, we have Qc'^{W^) = (Pja' (cf. Defi- 
nition[T2Ti. Since we know fi G |-P]g' it holds Q^'^{W^) ^ 0, which shows that Qf''^ 
is satisfiable. 

Only if: Let Qf "^ be a satisfiable SPARQLld(r) query that uses SPARQL expression 
P, a nonempty set S CU of seed URIs, and an arbitrary reachability criterion c. Since 
Q^'^ is satisfiable, exists a Web of Linked Data W such that Q^'^(W) 7^ 0. According 
to Definition[T2lwe also have Q^'^(W) ~ iPJAiiDatniw)- Thus, we conclude that P is 
satisfiable. 



C.18 Proof of Propositionig Case|2] 

This proof is similar to the proof of Proposition [T] case|2](cf. Section |C3] l. 

If: Let P be a SPARQL expression that is nontrivially satisfiable and let Qf '^ be an 
arbitrary SPARQLld(R) query that uses P, a nonempty set S C U of seed URIs, and 
an arbitrary reachability criterion c. W.l.o.g. it suffices to show that Qf '^ is nontrivially 
satisfiable. 

Due to the nontrivial satisfiability of P exists a set of RDF triples G and a valuation 
H such that i) fi e |P]g and ii) dom{fi) ^ 0. Let G" = lin^''^(/i) be the (P, G)-lineage 
of fi. We use G' to construct a Web of Linked Data W^ = (-D^^, data^, adoCfj) which 
consists of a single LD document. This document may be retrieved using any URI from 
the (nonempty) set S and it contains the (P, G)-lineage of /i (recall that the lineage is 
guaranteed to be a finite). Formally: 

Dfi = {d} data^j_{d) ~ G' \/u £ S : adoCfj,{u) = d 

Due to our construction it holds AllData(Wp) = AllData(W<R) = G' where Wy^ de- 
notes the {S, c, P)-reachable part of W,,. Thus, we have Qf^^(W^) = |P]g' (cf. Def- 
inition[T2]i. Since we know /i G |P]g' and dom(/i) ^ 0, we conclude that Q^'^ is 
nontrivially satisfiable. 

Only if: Let Qf ■^ be a nontrivially satisfiable SPARQLld(R) query that uses SPARQL 
expression P, a nonempty set S d U, and an arbitrary reachability criterion c. Since 



Q^'^ is nontrivially satisfiable, exists a Web of Linked Data W and a valuation fi such 
that i) IJ, & Qc'^'{W) and ii) dom(/i) =/= 0. According to Definition [T2l we also have 
Qf'^'iW) = l-PlAiiData(TV)- Thus, wc coucludc that P is nontrivially satisfiable. 

C.19 Proof of Proposition^ Case|3] 
If: Let: 

- P be a SPARQL expression that is monotonic; 

- Qc'^ be an arbitrary SPARQLld(r, query that uses P, a nonempty set S C U of 
seed URIs, and an arbitrary reachability criterion c; and 

- Wi , W2 be an arbitrary pair of Webs of Linked Data such that Wi is an induced 
sub web of W2 ; and 

- Wrjii = {D<}ii,datat)ii,adocy{i) and M^(H2 = {D9\2,data^2,a.docry{2) denote the 
(5, c, P)-reachable part of Wi and of W2, respectively. 

To prove that Q^^^ is monotonic it suffices to show Q^'^(Wi) C Q^'^{W2)- 

Any LD document that is (c, P)-reachable from S in Wi is also (c, P)-reachable 
from S in W2 because Wi is an induced subweb of W2. Hence, _DtHi ^ Dvi2 and, thus, 
AllData(W<Ri) C AllData(VK<„2). Furthermore, Qf^^(VFi) = IPlAiiData(w«i) and 
Qc'^{W2) = [^]AiiData(VK559) (^f- Definition[T2]). Due to the monotonicity of P it also 

holds lPlAUData(^,.i) C |P] AUData(l4/«.) • HcnCC, Qf^(M^l) C Qf'^(M^2). 

C.20 Proof of Proposition!!] 
Let: 

- Q^'f^ be a SPARQLld(R) query (under CNone-semantics) such that |5| = 1; 

- Wi = (Di, datai, adoci) and W2 = {D2, data2, adoc2) be two Webs of Linked 
Data such that Wi is an induced subweb of W2', and 

- W]^ and W^ denote the {S, CNone, P)-reachable part of Wi and of W2, respec- 
tively. 

W.l.o.g. it suffices to show Qc£„^{Wi) ^ Qc£„J}^'^)- ^^ distinguish the following 
three cases for u G S = {u}: 

1. u ^ dom(adoci) and u ^ doiii(adoc2). 

In this case W]^ and W^ are equal to the empty Web (which contains no LD 
documents). Hence, Qf^f^W^i) = 2f^ife(^2) = 0- 

2. u ^ doin{adoci) and adoc2{u) = d where d £ D2- 

In this case W^^ is equal to the empty Web, whereas W^ contains a single LD doc- 
ument, namely d. Hence, Q^±{Wi) = and 2^1(1^2) = [^]AiiData(iv,^) = 
iPhata.id) and, thus, Q5f (W^l) C Q^£JW2). 

3. adoci{u) = d and adoc2[u) = d where d € Di C D2- 

In this case both Webs, VK/^ and ^^2^, contain a single LD document, namely 
d. Hence, Qf^f^^i) = Q^±{W2) = (Phata^id) (recall, in this case holds 
datai{d) = data2{d)). 

4. u £ dom(a(ioci) and u ^ dom(a(ioc2). 

This case is impossible because Wi is an induced subweb of W2- 



Algorithm 2 The program of the {P, S, c)'-machine. 



1: Call lookup for each ti G 5. 

2: expansionC ompleted :— false 

3: while expansionC ompleted = false do 

4: Scan the link traversal tape for an RDF triple t and a URI u G uris(i) such that 
i) c{t, u, P) = true and ii) the word on the link traversal tape neither contains 
enc(w) enc{adoc{u)) J nor enc(u) jj. If such t and it exist, call lookup for it; other- 
wise expansionC ompleted :— true. 

5: end while 

6: Let C denote the set of all RDF triples cun^ently encoded on the link traversal tape. For each 
j-i G IPjc add enc(^) to the output. 



C.21 Proof of Lemma |2] 

For proving Lemma |2] we introduce specific LD machines for SPARQLld(R) queries. 
We call these machines {P, S, c)'-machines. The {P, S, c)'-machine for SPARQLld(r) 
query Qf '^ implements a generic (i.e. input independent) computation of Q^'^ ■ For 
any nontrivially satisfiable SPARQLld(R) query Qf "^ we shall see that if and only if the 
{S, c, P)-reachable part of an arbitrary Web of Linked Data W is finite, the correspond- 
ing {P, S, c)'-machine computes Qf'^(W) and halts. Formally, we define (P, S, c)'- 
machines as follows: 



Definition 18. Let S C U be a finite set of seed URIs; let cbe a reachability criterion; 
and let P be a SPARQL expression. The {P, S, c)' -machine is an LD machine that 
implements Algorithm^ This algorithm makes use of a subroutine called lookup. 
This subroutine, when called with a URI u &IA, i) writes enc(u) to the right end of the 
word on the link traversal tape, ii) enters the expand state, and Hi) performs the expand 
procedure as specified in Definition^ 

Before we complete the proof of Lemma |2] we discuss properties of any {P, S, c)'- 
machine as they are relevant for the proof. The computation of each {P, S, c)'-machine 
(with a Web of Linked Data W encoded on its input tape) starts with an initialization 
(cf. line [T] in Algorithm |2]i. After the initialization, the machine enters a (potentially 
non-terminating) loop that recursively discovers (i.e. expands) all LD documents of 
the corresponding reachable part of W. The following lemma shows that for each such 
document exists an iteration of the loop during which the machine copies that document 
to its link traversal tape. 

Lemma 8. Let: 

- M^^'^'^i be the {P, S, c)' -machine for a SPARQL expression P, a finite set S <ZlA, 
and a reachability criterion c; 

- W ^ {D, data, adoc) be a (potentially infinite) Web of Linked Data encoded on 
the Web tape of M^^'^'"')'; and 

- d E D be an LD document that is (c, P)-reachable from S in W. 



During the execution of Algorithm^by M^^'''^' exists an iteration of the loop (lines^ 
to^ after which the word on the link traversal tape ofM^^'^'^i (permanently) contains 
enc((i). 

Proof of Lemma|8l To prove the lemma we first emphasize that A/'^ "^^^^ only appends 
to the word on its link traversal tape. Hence, M^^'^''^^ never removes cnc{d) from 
that word once it has been added. The same holds for the encoding of any other LD 
document d' G D. 

Since d is (c, P)-reachable from 5 in W, the link graph for W contains at least 
one finite path (do, •■• , dn) of LD documents di where i) n G {0, 1, ...}, ii) d„ = d, 
iii) 3u € S : adoc{u) = do, and iv) for each i £ {1, ... , n} it holds: 



3t G data{di^i) : (3u £ uris(t) : {adoc{u) — di and c(t,u,P) = true) 1 (1) 

Let (dp, ••• , d*J be such a path. We use this path to prove the lemma. More precisely, 
we show by induction over i G {0, ... , n} that there exists an iteration after which the 
word on the link traversal tape of AI^^'^''^'' contains enc((i* ) (which is the same as 
enc(d) because d* = d). 

Base case [i = 0): Since 3 u G 5 : adoc{u) = dg it is easy to verify that after the 0-th 
iteration (i.e. before the first iteration) the word on the Unk traversal tape of M'^^'^^'^^ 
contains eiic((io) (cf. line[T]in Algorithm|2]i. 

Induction step {i > 0): Our inductive hypothesis is: There exists an iteration after which 
the word on the link traversal tape of M^^''^''^^ contains enc{d*_i). Let this be the j-th 
iteration. Based on the hypothesis we show that there exists an iteration after which 
the word on the link traversal tape of Af (^'■5.'=) contains enc{d*). We distinguish two 
cases: after the j-th iteration the word on the link traversal tape either already con- 
tains enc{d*) or it does not contain cnc{d*). We have to discuss the latter case only. 
Due to ([B exist t* G data{d*_^) and u* G uris(r) such that adoc{u*) = d* and 
c{t*,u*,P) = true. Hence, there exists a (5 G N+ such that M^^'^'"")' finds t* and u* 
in the (j+S)-th iteration. Since A/'^''^-^' calls lookup for u* in that iteration (cf. line|4] 
in Algorithm|2]i, the link traversal tape contains enc{d*) after that iteration. D 

While Lemma |8] shows that Algorithm |2] discovers all reachable LD documents, the 
following lemma verifies that the algorithm does not copy data from unreachable docu- 
ments to the link traversal tape. 

Lemma 9. Let: 

— M^^'^'"^' be the (P, S, c)' -machine for a SPARQL expression P, a finite set S C U, 
and a reachability criterion c; 

— W ~ {D, data, adoc) be a (potentially infinite) Web of Linked Data encoded on 
the Web tape ofM^^'^'"')'; and 

— Wc ' denotes the {S,c, P)-reachable part of W. 

For any RDF triple t encoded on the link traversal tape of A-P^'^''^> it holds (at any 
point of the computation): t £ A\lDat&{Wc ' ). 



Proof of Lemma m Let Wj denote the word on the link traversal tape of m'-^'^''^^ 
when M^^'^'"^' finishes the j-th iteration of the loop in Algorithm |2j wq denotes the 
corresponding word before the first iteration. To prove the lemma it is sufficient to show 
for each Wj (where j e {0, 1, ...}) exists a finite sequence ui, ... , Un^ of Uj different 
URIs Ui eU (for all i e [1, Uj]) such that i) Wj i^ 

enc(ui) cnc {adoc{ui)) jj ... ttcnc(u„^. ) cnc(adoc(u„^)) jj 

and ii) for each i G [1,?t-j] either Ui ^ doni{adoc) (and, thus, adoc{ui) is undefined) 
or adoc{ui) is an LD document which is (c, P)-reachable from S in W. We use an 
induction over j for this proof. 

Base case (j = 0): The computation of M^^^^-'^'^ starts with an empty link traversal 
tape (cf. Definition H]). Due to the initialization, wq is a concatenation of sub-words 
enc{u) enc{adoc{u)) ^ for all u G S* (cf. line [T] in Algorithm O. Hence, we have a 
corresponding sequence wi,...,w„o where no = l^l andVi G [l,no] '■ Ui G S. The 
order of the URIs in that sequence depends on the order in which they have been looked 
up and is irrelevant for our proof. For all u G S* it holds either Ui ^ dom(arfoc) or 
adoc{u) is (c, P)-reachable from S" in 14^ (cf. case[T]in DefinitionfTOb. 

Induction step {j > 0): Our inductive hypothesis is that there exists a finite sequence 

Ml, ... , Un_i of nj_i different URIs (V J G [1, nj-i] '■ Ui G U) such that i) Wj-i is 

enc(ui) cnc{adoc(ui)) jj ... (J cnc(u„._j) cnc(adoc(u„._j)) jj 

and ii) for each i G [1, rij^i] either Ui ^ dom(adoc) or adoc{ui) is (c, P)-reachable 
from S in W . In the j-th iteration M*^^ ■^'^^ finds an RDF triple t encoded as part of 
Wj-i such that 3 u G uris(t) : c(i, u, P) = true and lookup has not been called for 
u. The machine calls lookup for u, which changes the word on the link traversal tape 
to Wj. Hence, Wj is equal to Wj^i cnc(u) enc(adoc(u)) jl and, thus, our sequence of 
URIs for Wj is Ml, ... , w„. j,u. It remains to show that if m G dom(adoc) then adoc{u) 
is (c, P)-reachable from S in W . 

Assume u G dom(adoc). Since RDF triple t is encoded as part of Wj-i we know, 
from our inductive hypothesis, that t must be contained in the data of an LD document 
d* that is (c, P)-reachable from 5 in M^ (and for which exists i G [1, nj-i] such that 
adoc{ui) = d*). Therefore, t and u satisfy the requirements as given in case |2] of 
Definition [TOl and, thus, adoc{u) is (c, P)-reachable from S in W . D 

After verifying that Algorithm |2]is sound (cf. Lemma |9) and complete (cf. Lemma[8ll 
w.rt. discovering reachable LD documents, we now show that an execution of the algo- 
rithm terminates if the corresponding reachable part of the input Web is finite. 

Lemma 10. Let: 

— M^^'^'^ be the (P, S*, c)' -machine for a SPARQL expression P, a finite set S C U, 
and a reachability criterion c; 

- W = {D, data, adoc) be a (potentially infinite) Web of Linked Data encoded on 
the Web tape of M^^^^'"^^'; and 



' We assume euc{adoc{ui)) is the empty word if adoc(ui) is undefined (i.e. Ui ^ dom(adoc)). 



- Wc ' denotes the {S,c, P)- reachable part of W. 

The computation of M^^'^'^' halts after a finite number of steps if Wc ' is finite. 

Proof of Lemma [lOl Let Wc ' be finite. To show that the computation of 7\f ^^'^'^^ 
on (Web) input cnc(M^) halts after a finite number of steps we emphasize the following 
facts: 

1. Each call of subroutine lookup by M^^'"^'^) terminates because the encoding of 
W is ordered following the order of the URIs in doui{adoc). 

2. M^^'^'^'^ completes the initialization in line[T]of Algorithm|2]after a finite number 
of steps because S is finite. 

3. At any point in the computation the word on the link traversal tape of M^^'^''^'> 
is finite because Af •^^'"^'■^^ only gradually appends (encoded) LD documents to 
that tape (one document per iteration) and the encoding of each document is finite 
(recall that the set of RDF triples data{d) for each LD document d G D is finite). 

4. During each iteration of the loop in Algorithm!!] A/'^^'^'^^ completes the scan of 
its link traversal tape (cf. line|4]i after a finite number of computation steps because 
the word on that tape is always finite. Thus, M^^'^''^^ finishes each iteration of the 
loop after a finite number of steps. 

5. M^^'^''^^ considers only those URIs for a call of subroutine lookup that i) have 
not been considered before and that ii) are mentioned in (RDF triples of) LD docu- 
ments from Wc ' (cf. line|4Ji. Since Wc ' is finite there is only a finite number 
of such URIs and, thus, the loop in Algorithm |2] as performed by A/^^ ■^■^^ has a 
finite number of iterations. 

6. Due to the finiteness of the word on the link traversal tape the set G used in line|6]of 
Algorithm|2]is finite and, thus, \P\g is finite. As a consequence M^^'^''^'^ requires 
only a finite number of computation steps for executing line |6l 

Altogether, these facts prove LemmafTOl D 

We now prove Lemma |2] Let: 

- Qc'^ be a SPARQLld(R) query that is nontrivially satisfiable; 

- VF be a (potentially infinite) Web of Linked Data; and 

- Wc ' = {D<yi,datarn, adoc^) denote the {S, c, P)-reachable part of W. 

(s P) 
If: Let Wc ' be finite. We have to show that there exists an LD machine that computes 

Q^'^(W) and halts after a finite number of computation steps. Based on Lemmas [8] 

to[TO]it is easy to verify that the {P, S, c)'-machine (for P, S and c as used by Qc'^) is 

such a machine. 

Only if: Let A/ be an LD machine (not necessarily a {P, S, c)'-machine) that computes 
Qc'^{W) and halts after a finite number of computation steps. We have to show that 

(s P) (s P) 

Wc ' is finite. We show this by contradiction, that is, we assume Wc ' is infinite. 

In this case Dyi is infinite. Since Qf '^ is nontrivially satisfiable it is possible that W 

(s P) 
is a Web of Linked Data for which exist solutions in Wc ' such that each of these 

solutions provides a binding for at least one variable. Hence, for computing Qc'^ over 



W completely, machine M must (recursively) expand the word on its link traversal 
tape until it contains the encodings of (at least) each LD document in Dt^. Such an 
expansion is necessary to ensure that the computed query result is complete. Since Z?<r 
is infinite the expansion requires infinitely many computing steps. However, we know 

that M halts after a finite number of computation steps. Hence, we have a contradiction 

(s P) 
and, thus, Wc ' must be finite. 



C.22 Proof of PropositionE] 

Let Cef be a reachability criterion that ensures finiteness. To prove that all SPARQLld(R) 
queries under Ce/ -semantics are finitely computable we have to show for each such 
query exists an LD machine that computes the query over any Web of Linked Data 
and halts after a finite number of computation steps (with an encoding of the complete 
query result on its output tape). W.l.o.g., let Q^^^ be such a SPARQLld(R) query (under 
Ce/-semantics). Based on Lemmas [8] to [TOl (cf. Section IC. 2 II ) it is easy to verify that 
the (P, 5, Ce/)' -machine (for P, S and c^f as used by Q^^f) is such a machine (no- 
tice. Lemmas |8] to [10] are not restricted to SPARQLld(r) queries that are nontrivially 
satisfiable). 



C.23 Proof of Theoremg] 

Let Cnf be a reachability criterion that does not ensure finiteness. To prove Theo- 
rem|5]we distinguish three cases for a satisfiable SPARQLld(R) query Q^f under c„/- 
semantics: 

1. The (5, c„/, P) -reachable part of any Web of Linked Data is finite (which is pos- 
sible even if c„/ does not ensure finiteness for all SPARQLld(R) queries under c„/- 
semantics). 

2. The (5, Cnf , P)-reachable part of some Web of Linked Data is infinite and Qf^f is 
monotonic. 

3. The {S, Cnf , P)-reachable part of some Web of Linked Data is infinite and Q^f is 
not monotonic. 

In the following we discuss each of these cases. 

Case (1): Let Q^'^ be a satisfiable SPARQLld(R) query (under c„/-semantics) such 
that the (S", c„/, P')-reachablepartof any Web of Linked Data is finite. We claim that in 
this case Qf ■■^ is finitely computable (independent of its monotonicity). To prove this 
claim we use the same argument that we use for proving Proposition|8]in Section |C.22| 
Based on Lemmas[8]to[T0l(cf. Section IClTT ) and the fact that the (5', c„/, P')-reach- 
able part of any Web of Linked Data is finite, it is easy to verify that the (P', 5", c„/)'- 
machine (for P', S' and c„/ as used by Qcl^ ) is an LD machine that computes Qcl^ 
over any Web of Linked Data W and halts after a finite number of computation steps 
(with an encoding of QcJ-^' (W) on its output tape). Hence, the (P', S' , c„/)'-machine 
satisfies the requirements in Definition |5]and, thus, Q^l^ is finitely computable. 



Algorithm 3 The program of the {P, S, c)-machine. 



1: Call lookup for each u € S. 

2: for j = 1,2,... do 

3: Let Tj denote the set of all RDF triples currently encoded on the link traversal tape. Use 
the work tape to enumerate the set [P}tj ■ 

4: For each /i G [f ]t check whether /j. is already encoded on the output tape; if this is 
not the case, then add enc(/x) to the output. 

5: Scan the link traversal tape for an RDF triple t that contains a URI u £ uris(i) such 
that i) c{t,u, P) — true and ii) the word on the link traversal tape neither contains 
enc(ii) enc{adoc{u)) {t nor enc(it) jj. If such i and u exist, call lookup for u; other- 
wise halt the computation. 

6: end for 



Case (2): Let QcJ^ be a satisfiable, monotonic SPARQLld(R) query (under c„/-se- 
mantics) for which exists a Web of Linked Data W such that the (5", c„/, P')-reachable 
part of W is infinite. To show that Qf i^ is (at least) eventually computable we intro- 
duce specific LD machines for SPARQLld(r) queries. We call these machines {P, S, c)- 
machines. The {P, S, c)-machine for a SPARQLld(r) query Q^'^ implements a generic 
(i.e. input independent) computation of Q^'^ ■ We shall see that if a SPARQLld(R) query 
Qf'^' is monotonic, the corresponding (P, S, c)-machine (eventually) computes Qf '^ 
over any Web of Linked Data. We emphasize that (P, 5, c)-machines differ from the 
(P, S, c)'-machines that we use for proving Lemma|2](cf. Section IC. 2 lb . Formally, we 
define (P, S, c)-machines as follows: 

Definition 19. Let S CU be a finite set of seed URIs; let cbe a reachability criterion; 
and let P be a SPARQL expression. The (P, S, c)-machine is an LD machine that im- 
plements Algorithm\3\ This algorithm makes use of a subroutine called lookup. This 
subroutine, when called with a URI u £ U, i) writes cnc(u) to the right end of the 
word on the link traversal tape, ii) enters the expand state, and Hi) performs the expand 
procedure as specified in Definition^ 

As can be seen in Algorithm[3] the computation of each (P, S, c)-machine (with a Web 
of Linked Data W encoded on its input tape) starts with an initialization (cf. line[Tll. 
After the initialization, the machine enters a (potentially non-terminating) loop. During 
each iteration of this loop, the machine generates valuations using all data that is cur- 
rently encoded on the link traversal tape. The following proposition shows that these 
valuations are part of the coiTesponding query result (find the proof for Proposition [TO] 
below in Section lC.24l i: 

Proposition 10. Let: 

— Qc'^ b^ ^ SPARQLiDfK) query that is monotonic; 

— M^ ' ''^' denote the (P, S, c)-machine for P, S, and c as used by Q^'; and 

— W be an arbitrary Web of Linked Data encoded on the Web tape of AP ■ ''^\ 
During the execution of Algorithm^by JvP^'^''^' it holds; 



V.?e{l,2,...}:lPlT, QQ^,'%W) 



Proposition [TOl presents the basis to prove the soundness of (monotonic) query results 
computed by Algorithm |3] To verify the completeness of these results it is important 
to note that (P, S, c)-machines look up no more than one URI per iteration (cf. line|5] 
in Algorithm |3]l. Hence, {P, S, c)-machines prioritize result construction over data re- 
trieval. Due to this feature we show that for each solution in a query result exists an 
iteration during which that solution is computed (find the proof for Proposition [TTIbe- 
low in Section |C.25t : 

Proposition 11. Let: 

— Qc'^ be a SPARQLiD^jf) query that is monotonic; 

— M^^'^'^> denote the {P, S, c)-machine for P, S, and c as used by Q^'^; and 

— W be an arbitrary Web of Linked Data encoded on the Web tape of M^^''^\ 

For each jj, G Q^'^iW) exists a j^ 6 {1,2,...} such that during the execution of 
Algorithm\3\by M^^^^^"^ it holds: 

VjeUojV. + i,...}:/-'eMT, 

So far our results verify that i) the set of query solutions computed after any iteration is 
sound and ii) that this set is complete after a particular (potentially infinite) number of 
iterations. We now show that each iteration definitely finishes after a finite number of 
computation steps (find the proof for Proposition [T2lbelow in Section |C.26l i: 

Proposition 12. Let: 

— M^^'^'^> be the {P, S, c)-machine for a SPARQL expression P, a finite set S C 14, 
and a reachability criterion c; and 

— W be a (potentially infinite) Web of Linked Data encoded on the Web tape of 

During the execution of Algorithm\3\ AP^'''^' finishes each iteration of the loop in that 
algorithm after a finite number of computation steps. 

Altogether, Propositions [TOlto [T2]conclude the discussion of case (2), that is, based on 
these propositions it is easy to verify that the (P', 5', c„/)-machine for our query Q^jf 

satisfies the requirements in Definition |6] and, thus, Q^l^ is eventually computable. 

Case (3): Let Q^J^ be ^ satisfiable, non-monotonic SPARQLld(r, query (under c„/- 
semantics) for which exists a Web of Linked Data W such that the (S", c„/, P') -reach- 
able part of W is infinite. To show that Qf :^ may not even be eventually computable, 

we assume Q^if\w) ^ 0. 

For the prove we use the same argument that we use in the corresponding discussion 
for non-monotonic SPARQLld queries (see the proof of Theorem [T] in Section |C5t . 
Hence, we show a contradiction by assuming Q^J^ were (at least) eventually com- 
putable, that is, we assume an LD machine M (which is not necessarily a (P, S, c)- 
machine) whose computation of Q^J^ on any Web of Linked Data has the two prop- 
erties given in Definition|6] To obtain a contradiction we show that such a machine does 
not exist. 



Let W he ci Web of Linked Data such that the (S' , Cnj, P')-reachable part of W is 
infinite; such a Web exists for case (3). In the remainder of this proof we write Wyi to 
denote the (S", c„/, P')-reachable part of W. 

Let W be encoded on the Web tape of M and let /i be an arbitrary solution for 
QcJ^ in W; i.e. /i £ Q^„f (^)- I^^sed on our assumption, machine M must write 
enc(/i) to its output tape after a finite number of computation steps (cf. property |2] 
in Definition |6]l. We argue that this is impossible: Since Q^ [^ is not monotonic, M 
cannot add /x to the output before M has accessed all LD documents in W<n (i.e. all LD 
documents that are (c„/, P')-reachable from S" in W). However, due to the infiniteness 
of Wyi, there is an infinite number of such documents. Therefore, accessing all these 
documents is a non-terminating process and, thus, M cannot write /i to its output after 
a finite number of computation steps. As a consequence, the computation of Q^'^ 
(over W) by M does not have the properties given in Definition |6] which contradicts 
our initial assumption. Due to this contradiction we may conclude that Qf ]^ is not 
eventually computable. 

C.24 Proof of Proposition[lO] 

Let: 

- Qc'^ be a SPARQLld(R) query that is monotonic; 

- M^^'^'^^ denote the (P, 5, c)-machine for P, S, and c as used by Qf ■■^; and 

- W — {D, data, adoc) be an arbitrary Web of Linked Data encoded on the Web 

tapeofM^'f''^^^). 

To prove PropositionfTolwe use the following lemma. 

Lemma 11. During the execution ofAlgorithm\3\by M'-^^^''^^ on (Web) input eiic(VF) 
itholdsMj e {1,2, ...} : Tj C A\\Data{w!:^'^^). 

Proof of Lemma [TTl This proof resembles the proof of the corresponding lemma for 
]\,jiP,s,c) machines (cf. Lemma |9] in Section IC. 2 lb . Let Wj be the word on the link 
traversal tape of M^^'^''^^ when M^^'^''^^ starts the j-th iteration of the main processing 
loop in Algorithm[3](i.e. before line|3). 

To prove Vj £ {1,2,...} : Tj C AllData(W^i^''^^) it is sufficient to show for 
each Wj (where j G {1, 2, ...}) exists a finite sequence ui, ... , u„ . of rij different URIs 
Ui G U (where i G [1, rij]) such that i) Wj i^j 

enc(ui) enc(a(ioc(iti)) () ... jjenc(u„^.) enc (a(ioc(u„^)) fl 

and ii) for each i G [l,?^j] either Ui ^ Ao\Tv{adoc) (and, thus, adoc{ui) is undefined) 
or adoc{ui) is an LD document which is (c,P) -reachable from S in W. We use an 
induction over j for this proof. 

Base case {j = 1): The computation of M^^'^^'^^ starts with an empty link traversal 
tape (cf. Definition IH. Due to the initialization, wi is a concatenation of sub-words 



' We, again, assume enc(adoc(wi)) is the empty word iiui ^ dom(adoc). 



enc(u) enc(a(ioc(w)) tt for all w e 5 (cf. line [T] in Algorithm [3]l. Hence, we have a 
corresponding sequence Ml, ... ,u„j where ni = |5| andVi G [1,»t-i] : ik e S*. The 
order of the URIs in that sequence depends on the order in which they have been looked 
up and is irrelevant for our proof. For all u G 5 it holds either u ^ doin{adoc) or 
adoc{u) is (c, P)-reachable from S \nW (cf. case[T]in DefinitionfTOb. 

Induction step (j > 1): Our inductive hypothesis is that there exists a finite sequence 

Ui, ... ,u„._j of nj_i different URIs (Vi 6 [l,nj_i] : Ui G U) suchthati) Wj-i is 

enc(Mi) e\ic(adoc(ui)) fl ... tJenc(u„^_j) enc(adoc(iLnj_^)) tJ 

and ii) for each i G [1, fij-i] either Ui ^ dom(adoc) or adoc{ui) is (c, P)-reachable 
from S in W. In the (j-l)-th iteration M^-^'^''^'> finds an RDF triple t encoded as part of 
Wj-i such that 3 it G m-is(t) : c(i, u, P) = true and lookup has not been called for 
u. The machine calls lookup for u, which changes the word on the link traversal tape 
to Wj. Hence, Wj is equal to Wj-i cnc(u) c\ic{adoc{u)) ft and, thus, our sequence of 
URIs for Wj is Ml, ... ,m„j._i,m. It remains to show that if m G dom(a(ioc) then adoc{u) 
is (c, P) -reachable from S in W . 

Assume u G dom(a(ioc). Since RDF triple t is encoded as part of Wj-i we know, 
from our inductive hypothesis, that t must be contained in the data of an LD document 
d* that is (c, P)-reachable from 5 in W^ (and for which exists i G [1, rij-i] such that 
adoc{ui) = d*). Therefore, t and u satisfy the requirements as given in case |2] of 
Definition [TOl and, thus, adoc{u) is (c, P)-reachable from S in W. D 

Due to the monotonicity of Q^'^ it is trivial to show Proposition [TOlusing Lemma [TT] 

C.25 Proof of PropositionE] 

Let: 

- Qc'^ be a SPARQLld(R) query that is monotonic; 

- M^^'^'^^ denote the (P, S, c)-machine for P, S, and c as used by Q^'^\ and 

- W — {D, data, adoc) be an arbitrary Web of Linked Data encoded on the Web 

tape of Af (^■•5^^). 

To prove FropositionfTTIwe use the following lemma. 

Lemma 12. For each RDF triple t G AllData(H/c ' ') exists a jt G {1, 2, ...} such 
that during the execution of Algorithm^by M^^'^''^' on (Web) input enc{W) it holds 
yje{jtJt + l,-}:t€T,. 

Proof of Lemma [121 Let Wj be the word on the link traversal tape of A/*^^ ■^ "^^ when 
j\^(P,S,c) gj-^j-j-g j-j^g j.j-j^ iteration of the main processing loop in Algorithm|3](i.e. before 
hne|3]l. 

W.l.o.g., let t' be an arbitrary RDF triple t' G AllData(W^i'^''^^). There must exist 
an LD document d £ D such that i) t' G data{d) and ii) d is (c, P)-reachable from 5* 



in W. Let d' be such a document. Since Af (^ ■5.'=) only appends to its link traversal tape 
we prove that there exists a jf G {1, 2, ...} with \f j G {jt',jt' +1, ••■} : t' G 7} by 
showing that there exists jt> G {1, 2, ...} such that Wj , contains the sub-word cnc{d'). 
This proof resembles the proof of the corresponding lemma for A/t^'^'^^) machines 
(cf. Lemma[8]in Section lC.21l i. 

Since d' is (c, P) -reachable from S in W, the link graph for W contains at least 
one finite path (do, •■• , dn) of LD documents di where i) n G {0, 1, ...}, ii) dn = d', 
iii) 3u E S : adoc{u) ~ do, and iv) for each i G {1, ... , n} it holds: 

3 i G dataidi-i) : I 3 u G uris(t) : (adoc{u) ~ di and c{t, u, P) — true) J (2) 

Let (dg , . . . , d* ) be such a path. We use this path for our proof. More precisely, we show 
by induction over i G {0, ..., n} that there exists jt G {1, 2, ...} such that Wj^ contains 
the sub-word cnc(d* ) (which is the same as cwc{d') because d* = d')- 

Base case (i = 0): Since 3 u G 5 : adoc{u) ~ d^ it is easy to verify that wi contains 
the sub-word enc(dQ) (cf. line[T]in Algorithm|3]l. 

Induction step {i > 0): Our inductive hypothesis is: There exists j G {1,2,...} such that 
Wj contains sub-word enc(d*_j^). Based on the hypothesis we show that there exists a 
j' G {j,j + 1, ...} such that Wj' contains the sub-word enc(d*). We distinguish two 
cases: either enc(d*) is already contained in Wj or it is not contained in Wj. In the first 
case we have j' = j; in the latter case we have j' > j. We have to discuss the latter case 
only. Due to ^ exist t* G data{d*_i) and u* G uris(t*) such that adoc{u*) = d* and 
c{t*,u*,P) = true. Hence, there exists a (5 G N'' such that M^^"^''^) finds t* and u* in 
the (j+5)-th iteration. Since A/'^^'^-^' calls lookup for u* in that iteration (cf. line|5] 
in Algorithm^, it holds that Wj+s+i contains crLc(d*) and, thus, j' — j + 6 + 1. D 

We now prove Proposition [TT] by induction over the structure of possible SPARQL 
expressions. This proof resembles the proof of Lemma @](cf. Section ICTl l. 

Base case: Assume that SPARQL expression P is a triple pattern tp. W.l.o.g., let 

H G Qf'^(VK). It holds dom(/.t) = vars(tp) and t = ^[tp] G AllData(l^i^'^^) 
(cf. Definitions [T2l and [TSll. According to Lemma [T2l exists a j^ G {1,2,...} such 
that \/ j G {j',i, j^j + 1, ...} ; t G Tj. Since Q^'^ is monotonic we conclude Vj G 

{j^,j^ + l,...}:AiGlPlT,. 

Induction step: Our inductive hypothesis is that for SPARQL expressions Pi and P2 it 
holds: 

1. For each /.i G Q^^'^(W) exists a j^ G {1, 2, ...} such that during the execution of 
Algorithm[3]by M'-^-'^'"^ it holds Vj G {j,„j^ + l, ■■■} : M G I-PiIt, ; and 

2. For each /z G Qf^'^(W) exists a j^ G {1, 2, ...} such that during the execution of 
Algorithm[3]by M(^'^'^) it holds Vj G {j^,j^ + l,...} : ^i G IP2JT,. 

Based on this hypothesis we show that for any SPARQL expression P that can be 
constructed using Pi and P2 it holds: For each 11 G Qc'^{W) exists a j^ G {1, 2, ...} 
such that during the execution ofAlgorithm|3]byM('^''^''^nt holds Vj G { j/^ , j/^+l , ■ • ■ } : 



^ e {PJTj- W.l.o.g., let /i' e Qf ■^(VK). According to Definition [T4l we distinguish the 
following cases; 

- Pis (Pi AND P2). In this case exist ^1 G Qc^''^{W) and fi2 e Q^^'^{W) such that 
/i' = /ii U fj,2 and /ii '^ (U2- According to our inductive hypothesis exist j^^ , j^j^ G 
{1,2,...} such that i)Vj £ {j^^, j^^+l, ...} : ^1 G [PiIt, andii)Vj G {j^2,jV'2+ 
1,...} : ^2 e |P2]T,.Letj^/ = max({j^j, j^J).Duetothemonotonicityof Qf'-^ 
it holds V J G {v,.7p' + l,...} : m' e IPIt,. 

- P is (Pi FILTER R). In this case exist ^i* G Qf^ ■'^(W^) such that /i' = /i*. According 
to our inductive hypothesis exist j',j. G {1, 2, ...} suchthat Vj G {j^j*, jp*+l, ...} : 
^* G [PiItj- Due to the monotonicity of Qf-'^ it holds V j G {7^*, jp* +1, ...} : 

/^'elPjr,.' 

- P is (Pi OPTP2). We distinguish two cases: 

1. There exist ^1 G Q^^^^^^W) and ^2 G Qf^"^(VF) such that ^i' = ^1 U ^2 
and /ii '^ /_t2- This case corresponds to the case where P is (Pi and P2) (see 
above). 

2. There exist fii G Q^^'^^iW) such that fi' ^ fii and ^(12^ Q^^''^(W) : 
Ml 7^ /^2- According to our inductive hypothesis exist j^j^ G {1, 2, ...} such 
thatVj G {j^i, jpi+1,...} : /ii G |Pi]tj. Due to the monotonicity of Qf'^ it 
holds V J G {j^„j^,+l,...} : m' G [PIt,. 

- P is (Pi UNION P2). We distinguish two cases: 

1. There exists /i* G Qf^'^(W) such that /i' = /_**. According to our inductive 
hypothesis exist 7p. G {1, 2, ...} such that V? G {j^*, 7^* +1, •••} : /i* G 

2. There exists /i* G Qf^"^(VK) such that /i' = /i*. According to our inductive 
hypothesis exist 7p* G {1, 2, ...} such that V7 G {jp*, ?/j* +1, •••} : /i* G 

[^2]t,. 

Due to the monotonicity of Q^'^ it holds for both cases: V j G {j^* , j^i- + 1, ...} : 

/''g[P]t,. 

C.26 Proof of Proposition [T2I 

Let: 

- M^^'^'""^ be the (P, S, c)-machine for a SPARQL expression P, a finite set S CU, 
and a reachability criterion c; and 

- VF be a (potentially infinite) Web of Linked Data encoded on the Web tape of 

To prove that 7\/(^ ■5'=) finishes each iteration of the loop in Algorithm |3] after a finite 
number of computation steps, we first emphasize the following facts: 

1. Each call of subroutine lookup by M'^-^'^-'^^ terminates because the encoding of 
W is ordered following the order of the URIs in doni{adoc). 

2. M^^'^''^^ completes the initialization in line[T]of Algorithm [3] after a finite number 
of steps because S is finite. 



3. At any point in the computation the word on the hnk traversal tape of M^^ '^'^^ is fi- 
nite because Af ^^ ■^■^^ only gradually appends (encoded) LD documents to that tape 
(one document per iteration) and the encoding of each document is finite (recall that 
the set of RDF triples data{d) for each LD document d is finite). 

It remains to show that each iteration of the loop also only requires a finite number of 
computation steps: Due to the finiteness of the word on the link traversal tape, each 
|P]t (for j = 1, 2, ...) is finite, resulting in a finite number of computation steps for 
lines |3] and |4] during any iteration. The scan in line|5]also finishes after a finite number 
of computation steps because of the finiteness of the word on the link traversal tape. 

C.27 Proof of Theoremg] 

We formally define the termination problem for SPARQLld(R) as follows: 



Problem: Termination(SPARQLld(r)) 

Web Input: a (potentially infinite) Web of Linked Data W 

Ordinary Input: a finite but nonempty set S CU of seed URIs 

a reachability criterion c„/ that does not ensure finiteness 

a SPARQL expression P 
Question: Does an LD machine exist that computes Q^^f(W) and halts? 



To prove that Termination(SPARQLld(R)) is not LD machine decidable we reduce 
the halting problem to Termination(SPARQLld(R))- For this reduction we use the 
same argumentation, including the same Web of Linked Data, that we use for proving 
Theorem |2](cf. Section|C9]l. 

We define the mapping from input for the halting problem to input for Termina- 
TION(SPARQLld(R)) as follows: Let {w,x) be an input to the halting problem, that 
is, w is the description of a Turing machine M{w) and a; is a possible input word for 
M {w))\ then f{w,x) = {Wtms,Sw.x,ca\i,Pw,x) where: 



- W^TMs is the Web of Linked Data defined in Section^ 

- Sw,x = {ui'^} (recall, u^'^ denotes a URI that identifies the first step in the 
computation of AI{w) on input x), and 

- Pw,x = (u""'^, type,TerminatingComputation). 

As before, / is computable by Turing machines (including LD machines). 

To show that Termination(SPARQLld(r)) is not LD machine decidable, sup- 
pose it were LD machine decidable. In such a case an LD machine could answer the 
halting problem for any input {w, x) as follows: AI{w) halts on x if and only if an LD 
machine exists that computes Q^^(Wtm5) and halts. However, we know the halting 
problem is undecidable for TMs (which includes LD machines). Hence, we have a con- 
tradiction and, thus, Termination(SPARQLld(r)) cannot be LD machine decidable. 

C.28 Proof of Proposition in 

Let: 



- Qc'^ be a SPARQLld(r) query that uses a finite, nonempty set S CU of seed URIs 
and a reachability criterion c„/ which does not ensure finiteness; and 

- Gi , G2 be an arbitrary pair of set of RDF triples such that Gi C G2. 

Assume Q^f is monotonic. We have to show that the SR\RQL expression P (used 
by Qc^f ) is monotonic as well. We distinguish two cases: either P is satisfiable or P 
is not satisfiable. In the latter case P is trivially monotonic. Hence, we only have to 
discuss the first case. To prove that (the satisfiable) P is monotonic it suffices to show 
1^1 Gi C |P]g2- For this proof we construct two Webs of Linked Data Wi and W2 
such that i) Wi is an induced subweb of W2 and ii) the data of Gi and G2 is distributed 
over 14^1 and W2, respectively. Using Wi and W2 we show the monotonicity of P based 
on the monotonicity of Qc'^ ■ 

To construct Wi and W2 we have to address two problems; First, we cannot simply 
construct Wi and W2 as Webs of Linked Data that consist of single LD documents 
which contain all RDF triples of Gi and G2 because Gi and G2 may be (countably) 
infinite, whereas the data in each LD document of a Web of Linked Data must be fi- 
nite. Recall the corresponding proof for SPARQLld where we have the same problem 
(cf. Section lC4b : we shall use the same strategy for solving that problem in this proof. 
The second problem, however, is specific to the case of reachability-based semantics: 
The construction of Wi and W2 for SPARQLld(R) queries has to ensure that all LD 
documents which contain RDF triples of Gi and G2 are reachable. Due to this issue 
the construction is more complex than the corresponding construction for the full-Web 
semantics case. 

To solve the first problem we construct Wi and W2 as Webs that contain an LD 
document for each RDF triples in Gi and G2, respectively. However, by distributing 
the RDF triples from (the potentially infinite) Gi over multiple LD documents in a 
constructed Web, we may lose certain solutions /j, G |P]gi because the data of each 
LD document in a Web of Linked Data must use a unique set of blank nodes. The same 
holds for G2. To avoid this issue we assume a mapping g that maps each blank node 
in G2 to a new, unique URI. To define g formally, we let B denote the set of blank 
nodes in G2, that is, B ~ terms(G2) fl B. Furthermore, we assume a set Ub C U 
such that \Ub\ = \B\ and Ub H terms(G2) = 0. Now, p is a total, bijective mapping 
g : {{U U B U C) \Ub) ^ {{U U B U C) \ B) that, for any x e ((W U S U £) \ Ub), 
is defined as follows: 

] Qb{x) if a; G B, 

I X else. 

where gB is an arbitrary bijection gB ■ B ^ Ub- 

The application of g to an arbitrary RDF triple t ~ {xi,X2,X3), denoted by g[t], 
results in an RDF triple t' = {x'i,x'2,x';j) such that x'l = g{xi) for all i £ {1,2,3}. 
Furthermore, the application of p to a valuation fi, denoted by p[/i], results in a valuation 
fi' such that dom(/i') = dom(/i) and /i'(?f) = g{fi{?v)) for all 7v G dom(/i). 

We now let 

G[ == {g[t] 1 1 e Gi} and G^ = {g[t] \teG2} 

The following facts are verified easily: 



Fact 4. It holds: G[ C G'^, \Gi\ = \G[\, and IG2I = |G^|. 

Fact 5. For all j G {1,2} it holds: Let fi be an arbitrary valuation, then g[ii\ is a 
solution for P in G" if and only if fi is a solution for P in Gj. More precisely: 

\fp e IPJG, : QbA e IPJG'^ and V^^' G {Phr : Q-^] ^ I^Ig, 

where g^^ denotes the inverse of the bijective mapping g. 

We now address the second problem, that is, we construct Wi and W2 (using G[ and 
G'2) in a way that all LD documents which contain RDF triples from G[ and G'2 are 
reachable. To achieve this goal we use a reachable part of another Web of Linked Data 
for the construction. We emphasize that this reachable part must be infinite because Gi 
and G2 may be (countably) infinite. To find a Web of Linked Data with such a reachable 
part we make use of c„/: Since c„/ does not ensure finiteness, we know there exists a 
Web of Linked Data W* = {D* , data* , adoc*), a (finite, nonempty) set S* C U of 
seed URIs, and a SPARQL expression P* such that the (5*, c„/, P*)-reachable part of 
W* is infinite. Notice, S* and P* are not necessarily the same as S and P. 

While the (5*, c„/, P*)-reachable part of W* presents the basis for our proof, we 
cannot use it directly because the data in that part may cause undesired side-effects for 
the evaluation of P. To avoid this issue we define an isomorphism a for W*, S*, and 
P* such that the images of W*, S*, and P* under a do not use any RDF term or query 
variable from G2 and P. 

For the definition of <t we write U, L, and V to denote the sets of all URIs, literals, 
and variables in Gj and P (recall, neither G2 nor P contain blank nodes). That is: 

U = (terms(G2) U terms(P)) CiU, 

L = (terms(G2) U terms(P)) ("1 £, and 

V = vars(P) U varsF(P) 

where varsF(P) denotes the set of all variables in all filter conditions of P (if any). 
Similarly to [/, i, and V , we write [/*, L* , and V* to denote the sets of all URIs, 
hterals, and variables in W* , S*, and P*: 

U* ^S*U tcrms(AllData(iy*)) DU, 
L* ^ tcrms(AllData(VF*)) n £, and 
V* = vars(P*) U varsF(P*). 

Moreover, we assume three new sets of URIs, literals, and variables, denoted by [/pew, 
Lnew, and Kew, respectively. For these sets it must hold: 

f/new C U such that \Unew\ = \U\ and Unew H ([/ U [/*) =: 0; 

Lnevj C C such that |inew| = \L\ and Lnew fl (L U L*) = 0; and 
Kew C V such that |Kew| = \V\ and Kew n (F U V*) = 0. 

Furthermore, we assume three total, bijective mappings: 

au ■ U ^- Unevi (Tl : L ^ Lpew (Ty : F — !■ Kew 



Now we define cr as a total, bijective mapping 

a: ((iYUBU/:uV)\(L/newUL,ewUVnew)) ^ ( (W U i3 U £ U V) \ (L/ U L U F) ) 
such that for each x E dom(o') it holds: 



a{x) 



' au{x) if a; G [/, 

(Tl [x) if X E L, 

ay (x) if X e V, 

X else. 



The appUcation of <t to an arbitrary valuation fj, and to an arbitrary RDF triple is defined 
in a way that corresponds to the application of gto fi and t, respectively. An application 
of a to further, relevant structures is defined as follows: 

- The application of a to the aforementioned Web W* = {D* , data*,adoc*), de- 
noted by (7[W*], results in a Web of Linked Data W*' = {D*' ,data*',adoc*') 
such that D*' ~ D* and mappings data*' and adoc*' are defined as follows: 

\/dED*' : data*'{d) = {cr[t] 1 1 e data*{d)] 
Vu e dom(adoc*') : adoc*'{u) = adoc*(i7^^(u)) 

where doTti{adoc*') = {cr[u) I u G dom(a(ioc*)} and a^^ is the inverse of cr. 

- The application of a to an arbitrary (SPARQL) filter condition R, denoted by cr[i?], 
results in a filter condition that is defined as follows: i) If R is Ix = c, 7x =ly, or 
bound(?a;), then a[R] is Ix' = c', ?a:' =?2;', and bound(?a;'), respectively, where 
Ix' = <7{lx), ly' = a{7y), and c' = cr(cj; and ii) If R is (^i?i), (i?i A i^a), or, 
(i?i V i?2), then cr[i?] is {^R[), {R[ A fl^, or, {R[ V i?^), respectively, where 
R[ =a[Ri] andi?^ = cr[i?2]. 

- The application of a to an arbitrary SPARQL expression P', denoted by a[P'], re- 
sults in a SPARQL expression that is defined as follows: i) If P' is a triple pattern 
[x[,x'2,x'^), then a[P'] is {x'{, x'^, x'l) such that x'l = a{x'i) for all i G {1, 2, 3}; 
and ii) If P' is {P[ andP^), (Pj' union P^), {P[ optP^), or (Pj' filter P')' then 
<j[P'] is (P{' andP^')' {Pi union P^')' or (P{' optP^'), and (P{' filter P"), re- 
spectively, where Pj" = cr[P{], P^' = cr[P^], and P" = cr[P']. 

We now introduce VF*', S*', and P*' as image of W*, S*, and P* under a, respectively: 

W*'=a[W*] S*' = {(7{u)\ueS*} P*' = cr[P*] 

W*' is structurally identical to W*. Furthermore, the (5**', c„/, P*') -reachable part of 
W*' is infinite because the (S**, c„/, P*) -reachable part of W* is infinite. Hereafter, we 
write W^y{ = (-Dm, datasy\^ adoc^i) to denote the {S*' , c„f, P*') -reachable part of W*'. 
We now use Wy{ to construct Webs of Linked Data that contain all RDF triples from 
G'l and Gj, respectively. Since Wy{ is infinite, there exists at least one infinite path in 



the link graph of Wvr. Let p = di,d2, ... be such a path. Hence, for all i e {1, 2, ...} 
holds: 

di G Drf{ and Elt G dataf)i{di) : (3u E uris(t) : adoc^{u) ~ d, 



H+l 

We may use this path for constructing Webs of Linked Data Wi and W2 from Wyi 
such that 14^1 and W2 contain the data from G[ and G2, respectively. However, to allow 
us to use the monotonicity of SPARQLld(R) queries in our proof, it is necessary to 
construct Wi and W2 such that Wi is an induced subweb of W2. To achieve this goal 
we assume a strict total order on G'2 such that each t <E G'l C G'2 comes before any 
t' G G'2 \ G'l in that order Formally, we denote this order by infix < and, thus, require 
V {t, t') £ G'l X {G'2 \G'i) : t < t'. Furthermore, we assume a total, injective function 
pdoc ; G2 — > {d G D<y\ I d is on path p} which is order-preserving, that is, for each 
pair {t, t') G G'2 X G'2 holds: If t < t' then LD document pdoc{t) comes before LD 
document pdoc(t') on path p. 

We now use pdoc, G'2, and W<y\ ~ {Dyi,data'yi,adocyi) to construct a Web of 
Linked Data W2 = (-D2, data2,adoc2) as follows: 

D2 = D<n 

w ^ ^ n ^ , M^ / ^«^°« id)^{t} if 3 i G G'2 : pdocit) = d, 
y d e D2 : data2(d) = < , , , ,, 

I aatayi[a) else. 

Vu G dom{adocy{) : adoc2{u) — adocyi{u) 

In addition to W2, we introduce a Web of Linked Data Wi = {Di, datai,adoci) that 
is an induced subweb of W2 and that is defined b}|j 

-Di = {d G £'2 I either d is not on path poiBt £ G'l : d ^ pdoc(t)} 

The following facts are verified easily: 

Facte. For all j G {1,2} it holds: G'^ C AllData(W^j) = G^ U AllData(WcK). 

Fact 7. For all j G {1,2} it holds: The {S*',Cnf,P*')-reachablepartofWj is Wj 
itself. 



Facts. For all j G {1,2} it holds: |P]g' = {P] 



AUData(Wj) 



We now consider a SPARQL expression {P union P*'). In the following we write P to 
denote this expression. Since tcrms(G2) fl tcrms(AllData(WfH)) = we conclude 
the following facts: 

Fact 9. For all j e {1, 2} it holds: 



1. The {S*',Cnf,P)-reachablepartofWj is Wj itself. 

2. I-PlAllData(Wj) U \P* lAllData(Wj) = I-P] AnData(Wj) 

3. I-PlAllData(Wj) H [-P*'! AllData(W_, ) = 



' Recall, any induced subweb is unambiguously defined by specifying its set of LD documents. 



Since Wi is an induced subweb of W2, Qy^ is monotonic, and P is [P union P*'), we 
conclude the following inclusion from Fact |9] and Definition [121 

(2f;:f "(W^l) \ Ii^*'lAnData(W0) ^ (Qf„f ''(W^s) \ [P*'] AnData(W.)) 0) 

We now use W\ and W2 and the monotonicity of Q^f, to show |P]gi ^ 1^1 G2 (which 
proves that P is monotonic). W.l.o.g., let p. be an arbitrary solution for P in Gi, that 
is, /^ G I-Pld- Notice, such a [i must exist because we assume P is satisfiable (see 
before). To prove \P\gi ^ 1^1 G2 it suffices to show /x G |P]g2- 
Due to Fact |5] it holds 

and with Facts [8] and |9] and Definition [T2l we have 

e{A e (Qfjr'(W^l) \ lP*'lAnData(W,))- 

According to (|3]l we also have 

e{A e (QC;r'(W^2) \ lP*'lAnData(W2))- 

We now use Definition [T2l and Facts |9]and[8]again, to show 

Finally, we use Fact|5]again and find 

Q-^W^ gIP1g2- 

Since q^^ is the inverse of bijective mapping q, it holds (T^ \_Q\y^ = fJ- and, thus, we 
conclude fj, G |-P]g2- 

D Constant Reachability Criteria 

This section discusses a particular class of reachability criteria which we call constant 
reachability criteria. These criteria always only accept a given, constant set of data 
links. As a consequence, each of these criteria ensures finiteness. In the following we 
formally introduce constant reachability criteria and prove that they ensure finiteness. 

The (fixed) set of data links that a constant reachability criterion accepts may be 
specified differently. Accordingly, we distinguish four different types of constant reach- 
abihty criteria. Formally, we define them as follows: 

Definition 20. Let U C U be a finite set URIs and let T C T be a finite set of RDF 
triples. The U -constant reachability criterion (P is a reachability criterion that for 
each tuple {t,u,P) ^ T x U x V is defined as follows: 

^^^p\Urnc ifueU, 
' false else. 



The T-constant reachability criterion c^ is a reachability criterion that for each tuple 

(t,u, P) E T X U X V is defined as follows: 

c^(t,u^p] = [''''' '^*^^' 
V / [false else. 

The {U /\T)-constant reachability criterion c^^'^ is a reachability criterion that for 
each tuple (t, u, P) G T x U x V is defined as follows: 

Uf\T( r,\ J true ifu^Uandt^T, 
V ' ' / I false else. 

The iU \/ T)-constant reachability criterion c^^"^ is a reachability criterion that for 
each tuple (t, u, P) G T x U x V is defined as follows: 

c^-TLp\Urno ifueUorteT, 
V / 1 false else. 

As can be seen from the definition, a [/-constant reachability criterion uses a (finite) 
set U of URIs to specify the data links it accepts. Similarly, a T-constant reachability 
criterion uses a (finite) set T of RDF triples. ([/AT) -constant reachability criteria and 
(t/VT) -constant reachability criteria combine [/-constant reachability criteria and T- 
constant reachability criteria in a conjunctive and disjunctive manner, respectively. The 
reachability criterion CNone may be understood as a special case of [/-constant reach- 
ability criteria; it uses a U which is empty. Similarly, cwone may be understood as the 
T-constant reachability criterion for which T is empty. 
The following facts are trivial to verify: 

Fact 10. Let U d U and U' G U be finite sets of URIs such that U' C U. Similarly, 
let T d T and T' C T be finite sets of RDF triples such that T' C T. Further- 
more, let (P, (P , c^, and c^ denote the U -constant reachability criterion, the U' -con- 
stant reachability criterion, the T-constant reachability criterion, and the T' -constant 
reachability criterion, respectively. Moreover, (P ^^, P^'^, (P ^^, and cP ^^ denote 
the {\J KT^constant reachability criterion, the {U\/T)-constant reachability criterion, 
the {W AT')-constant reachability criterion, and the {U' V T')-constant reachability 
criterion, respectively. It holds: 

1. c is less restrictive than c and less restrictive than c . 

2. (P and c^ are less restrictive than P^"^ , respectively. 



3. 



c^ is less restrictive than P . 
(P is less restrictive than P . 



5. c^^'^ is less restrictive than P ^^ . 

6. c*^^^ is less restrictive than P ^^ . 

We now show that all constant reachability criteria ensure finiteness: 

Proposition 13. All U -constant, T-constant, [U /\T)-constant, and {U\/T)-constant 
reachability criteria ensure finiteness. 



Proof of Proposition [T3l To prove that a reachability criterion c ensures finiteness we 
have to show that for any Web of Linked Data W, any (finite) set S CU of seed URIs, 
and any SPARQL expression P, the (5, c, P)-reachable part of W is finite. W.l.o.g., let 
S" C Z^ be an arbitrary (but finite) set of seed URIs, let P' be an arbitrary SPARQL 
expression. According to Definition [TT] we know that the (S", c, P')-reachable part of 
any Web of Linked Data W is finite if the number of LD documents that are (c, P')- 
reachable from S' in W is finite. Due to the finiteness of S', it suffices to show that the 
set 

X{c,P')^ {{t,u,P) eT xU xVlue uris(i) and P = P' and c{t, u, P) = true} 

is finite for any Web of Linked Data (cf. Definition [TOli. Notice, the given set presents 
an upper bound for all tuples {t,u,P) £ T xU x V based on which LD documents 
may be reached by applying Definition [10] recursively. Hence, it is not necessarily the 
case that all these tuples are discovered (and used) during such a recursive application 
in a particular Web of Linked Data. 

We now focus on [/-constant, T-constant, (f7AT)-constant, and (f/VT) -constant 
reachability criteria. W.l.o.g., let U' C Uhe an arbitrary, finite set of URIs and let T' C 
T be an arbitrary, finite set of RDF triples. Furthermore, let (P ,(F , c^ ^^ , and c^ ^^ 
denote the [/'-constant reachability criterion, the T'-constant reachability criterion, the 
([/'/^')-constant reachability criterion, and the ([/'VT') -constant reachability criterion, 
respectively. 



For c*^'^^' it holds 



Xi^""^ ,P') < \U'\ + \T'\ 



and, thus, the set X{(P ^^ , P') is finite (recall, [/' and T' are finite). Therefore, the 
(S", (^ ^^ , P')-reachable part of any Web of Linked Data is finite. As discussed be- 
fore, this fact shows that c^ ^-^ ensures finiteness. However, we may also use this fact, 
together with Proposition |5] case|4] and Fact[TO] cases [T] and |2] to show that c^ "^ 



c 



JJ'f\T' 



and c ensure finiteness, respectively. D 



